**Discover your next playlist from another decade**

**Million Song Dataset**

Big Data and Cloud Computing Final Project

Chua | Delos Santos | Singson

Executive Summary

Hans Zimmer, an international award winning composer and music producer who has composed music for over 100 films, describes music as a means to rediscover your hummanity, and your connection to humanity. This study attempts to bridge generations of humanity through song recommendations across decades. Using the Million Song Dataset from the Laboratory for the Recognition and Organization of Speech and Aduio (LabROSA), a content-based recommendation engine using similarity-based metrics such as Cosine and Jaccard similarity is created that takes a song of your preference, and recommends several songs from another decade of your choice that you might deem similar, or interesting. From our test users, we achieved a best mean song-similarity of 6.1/10, and a mean of 7.25/10 for how much that user likes the recommendation.

Problem Statement

According to Global Music Report 2018, the global recorded music market grew by 8.1% last 2017. The people's patronage on the digital market (e.g. Spotify) accounted 54% of the global music market. Out of the 176 million paid subscribers, 36% of these are new accounts. These numbers present how big is the impact of digitalization and how it will continue to create impact in the future.

Continuous efforts are done by the music industry to develop new songs and artists. Moreover, preserving songs is still prevalent since a wide range of songs is provided even those from previous decades. This helps the listeners discover songs that are similar to their current preferences. For instance, a 19-yr old teen who would have naturally listened to 2000s songs might like Michael Jackson songs. Similarly, a 40-yr old woman who mostly listened to songs back in the 1980s might be interested to know more about Lady Gaga. It would be a huge chance to discover songs that were not created outside their generation and expand their song choices. This is commonly done by contestants in singing competitions such as The Voice and America Idol.

In this study, the team focuses on creating a recommender system that suggests songs from a preferred decade that are similar to a chosen song from another decade. Similarity is defined according to features such as tempo and loudness. The team hopes to bridge generations through similar music that are decade/s apart.

Data Description

The dataset used in this study is the Million Song Dataset(MSD) from Laboratory for the Recognition and Organization of Speech and Audio (LabROSA) in collaboration with Echo Nest. MSD was created for research purposes, primarily to retrieve music informations. It has approximately 280 GB worth of data dated from 1920-2010. The data is provided in HDF5 format, which has the following 55 features as illustrated below:

analysis sample rate (float)-sample rate of the audio used
artist 7digitalid (int)- ID from 7digital.com or -1
artist familiarity (float)- algorithmic estimation
artist hotttnesss (float)- algorithmic estimation
artist id (string) - Echo Nest ID
artist latitude (float) - latitude
artist location (string)- location name
artist longitude (float)- longitude
artist mbid (string) - ID from musicbrainz.org
artist mbtags array (string) - tags from musicbrainz.org
artist mbtags count array (int)- tag counts for musicbrainz tags
artist name (string)- artist name
artist playmeid (int)- ID from playme.com, or -1
artist terms array (string) Echo Nest tags
artist terms freq array (float)- Echo Nest tags freqs
artist terms weight array (float)- Echo Nest tags weight
audio md5 (string)- audio hash code bars confidence array (float)- confidence measure
bars start array (float)- beginning of bars, usually on a beat
beats confidence array (float)- confidence measure
beats start array (float) - result of beat tracking danceability (float) - algorithmic estimation
duration (float) - in seconds
end of fade in (float) - seconds at the beginning of the song
energy (float)- energy from listener point of view
key (int)-key the song is in
key confidence (float)- confidence measure
loudness (float)- overall loudness in dB
mode (int)- major or minor
mode confidence (float)- confidence measure
release (string)- album name
release 7digitalid (int)- ID from 7digital.com or -1
sections confidence array (float)- confidence measure
sections start array (float)- largest grouping in a song, e.g. verse
segments confidence array (float)- confidence measure
segments loudness max array (float)- max dB value
segments loudness max time array (float)- time of max dB value, i.e. end of attack
segments loudness max start array (float)- dB value at onset
segments pitches 2D array (float) - chroma feature, one value per note
segments start array (float)- musical events, ~ note onsets
segments timbre 2D array (float)- texture features (MFCC+PCA-like)
similar artists array (string) - Echo Nest artist IDs (sim. algo. unpublished)
song hotttnesss (float)- algorithmic estimation
song id (string)- Echo Nest song ID
start of fade out (float)- time in sec
tatums confidence array (float)- confidence measure
tatums start array (float)- smallest rythmic element
tempo (float)- estimated tempo in BPM
time signature (int)- estimate of number of beats per bar, e.g. 4
time signature confidence (float)- confidence measure
title (string)- song title
track id (string)- Echo Nest track ID
track 7digitalid (int) - ID from 7digital.com or -1
year (int) - song release year from MusicBrainz or 0

Methodology

The dataset far exceeds the allocated RAM of 8gb for each user in the ACCESS Lab supercomputer (Jojie). Because of this, Spark was used to decrease processing time, and load data greater than the available RAM using the Spark Cluster. Unfortunately, the Spark Cluster was not accessible during the timeframe of this project and the team was limited to the available RAM for a single user. With this, the team had to load a sample of the entire dataset, 10% or 100k songs with 8 fields per song were to be loaded. 100k file paths were stored in a list using Glob. A Pandas-based read function was created and this was parallelized using SparkConntext.parallelize that reads the entire list of paths and maps them to the read function mentioned. This is essentially a parallel pandas read using Spark.

Once the data has been loaded, the Pandas dataframe is then saved to a Parquet file to avoid reprocessing. This would be finally loaded as a Spark Dataframe for multicore processing. From hereon it's quite straightforward, vectorize the features using the vector assembler, create a table for each decade of songs, then compare the song of choice to the entire decade to determine the most similar songs. This is expounded in the subsections of building recommender systems using Cosine Similarity and Jaccard Similarity (Section 6).

For validation, we asked 8 MSDS students, including the team, to rate four recommended songs based on their base song of choice from 1-10 (10 being the highest). Of the four recommended songs, half of which are from the Cosine similarity based model and half are from the Jaccard similarity based model. There are two ratings, first is similarity of the base song vs. the recommended songs from a different decade, and second would be how much they like the recommended song.

loading 1m songs
There are a few strategies in which the team can load all the 1 million songs but it takes a lot of time. Because Spark does not have a native hd5 reader, the team converted each hd5 file to a CSV, resulting in 1 million CSVs. With this one might think that we can immediately combine all these CSVs into one using the Linux shell commands, but this isn't the case because of inherent limitations in commands like ls in listing all 1 million csvs. Because of this, we need to bin together these CSVs before combining, and re-combining each combined file per bin. In this case we chose to bin them at 50k songs each. This entire process is long and troublesome but it can be a decent workaround for not having a Spark Cluster to work with. But for the purpose of demonstrating the recommender system we stick with the 100k dataset.

Loading the Dataset

We loaded the million song dataset in parallel for Spark processing. With limitations on processing power and time, we decided to store reduced data samples in parquet.

In [1]:
import h5py
import glob
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql import functions as F

sc = SparkContext('local[4]')
session = SparkSession(sc)
In [ ]:
#TAKE A RANDOM SAMPLE OF 100K SONGS
import glob
np.random.seed(1337)
file_dirs = np.random.choice(glob.glob('/mnt/data/public/millionsong/*/*/*/*.h5'),200000).tolist()
In [ ]:
#reader function for the parallelization, returns a tuple of nd arrays containing the song name, and attributes of the song
def reader(path):
    #data = h5py.File(path)
    df3 = pd.read_hdf(path,'musicbrainz/songs')
    df1 = pd.read_hdf(path,'metadata/songs')
    df2 = pd.read_hdf(path,'analysis/songs')
    return pd.concat([df1,df2,df3],axis=1)[['danceability','duration','energy','key','loudness','tempo','time_signature','title','artist_name','song_id','year',
                                           'song_hotttnesss']]
    #return df1
    #return (data['metadata']['songs'].value,data['analysis']['songs'].value)
    
In [ ]:
from functools import reduce

Create parquet for 25k songs

In [ ]:
#taking a smaller subset of files just to test if parallel read is working, use the variable file_dirs for all 100k 
#songs when you are sure
temp_files = file_dirs[0]
In [ ]:
rdd_parallel = sc.parallelize(file_dirs).map(lambda x: reader(x))
In [ ]:
#this is one way, pero hindi siya spark dataframe

reduced = (reduce(lambda x,y: pd.concat([x,y]), rdd_parallel.map(lambda x: x).collect()))
In [ ]:
reduced.info()
In [ ]:
reduced.to_parquet('25ksongs_LT10.parquet.gzip',compression='gzip')

Create parquet for 100k songs

In [ ]:
#taking a smaller subset of files just to test if parallel read is working, use the variable file_dirs for all 100k 
#songs when you are sure
temp_files = file_dirs[0:5]
In [ ]:
rdd_parallel = sc.parallelize(file_dirs).map(lambda x: reader(x))
In [ ]:
#this is one way, pero hindi siya spark dataframe

reduced = (reduce(lambda x,y: pd.concat([x,y]), rdd_parallel.map(lambda x: x).collect()))
In [ ]:
reduced.info()
In [ ]:
reduced.to_parquet('100ksongs_LT10.parquet.gzip',compression='gzip')

Exploratory Data Analysis

For the EDA, the team simply used a smaller but representative subset of the data to identify some key insights with regards to the dataset we are exploring.

Load reduced dataset with ~45k songs

In [1]:
import pandas as pd
import numpy as np
import re
import ipywidgets as widgets
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt

import plotly.plotly as py
import plotly.graph_objs as go
import plotly.offline as pyoff
from IPython.display import display
from IPython.display import IFrame

pyoff.init_notebook_mode(connected = False)

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
In [2]:
reduced=pd.read_parquet('25ksongs_LT10.parquet.gzip')
In [3]:
df = reduced[['artist_hotttnesss', 'artist_latitude','artist_longitude','artist_name','song_hotttnesss',
              'song_id', 'title','duration','key','loudness','tempo','year','genre_tags']]

df.to_csv('data.csv')
df = pd.read_csv('data.csv')

Insight 1

While playing with the dataset, the team asked the question, how often do these songs duplicate in title? Surprisingly, there exists a single title that was used by 30 artists! Disappointingly, the title of this song was Intro, which is the title commonly used by artists for their instrumental playlists in the beginning of a CD album. We also showed below song titles that were repeatedly used by at least 6 times.

In [4]:
df_xy = df.groupby('title')[
    ['song_id']].count()
In [5]:
#desription for duplicate song titles
df_xy.describe()
Out[5]:
song_id
count 23560.000000
mean 1.061078
std 0.373533
min 1.000000
25% 1.000000
50% 1.000000
75% 1.000000
max 30.000000
In [6]:
#song titles with greater than 5 duplicates.
df_xy[df_xy['song_id']>5]
Out[6]:
song_id
title
Ave Maria 6
Free 6
Home 6
I Love You 7
Interlude 7
Intro 30
Maybe 6
Outro 10
Shine 8
Silent Night 6
Summertime 6
Untitled 8
Wake Up 7
Winter Wonderland 6

Insight 2

On average, songs are rated only 0.36. The team thought that this was strange and further investigated. It turns out that there are a significant amount of song that are marked as 0, and NaN in the dataset. Of the almost 45k songs in this Dataframe, approximately 30k have no hotttnesss rating. Now this can either mean that the song was a complete flop, or that it was not popular enough to be given a rating. Even so, the team decided to take the mean while discounting these songs with 0 hotness. So in reality, the mean hotness is 0.49 and standard deviation of 0.16.

In [7]:
#mean rating with NaN
df.song_hotttnesss.dropna().mean()
Out[7]:
0.3573337341275508
In [8]:
#number of NaN in the hotttnesss field
len(df.song_hotttnesss) - len(df.song_hotttnesss.dropna())
Out[8]:
30016
In [9]:
#mean hotness
df_nonan = df.dropna()
df_nonan[df_nonan['song_hotttnesss']>0].song_hotttnesss.mean()
Out[9]:
0.48510242294624994
In [10]:
#mean stdev
df_nonan[df_nonan['song_hotttnesss']>0].song_hotttnesss.std()
Out[10]:
0.16387246838601144
In [11]:
#Distribution of song hotness
sns.distplot(df_nonan[df_nonan['song_hotttnesss']>0].song_hotttnesss, hist=True, kde=False, 
             bins = 100,color = 'blue',
             hist_kws={'edgecolor':'black'})
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0da2c403c8>

Insight 3

Geographically, majority of the songs originated from North America and Europe. It can also be visually observed that most songs have song_hotttnesss that is within the 0.4-0.6 range.

In [12]:
df_coordinates = df
df_xy = df
data_xy = [go.Scattergeo(
    lon=df_coordinates['artist_longitude'],
    lat=df_coordinates['artist_latitude'],
    text=df_coordinates['title'].values,
    mode='markers',
    hoverinfo='text',
    marker=dict(
        color=df_xy['song_hotttnesss'],
        cmax=df_xy['song_hotttnesss'].max(),
        colorbar=dict(
            title="Song Hotness"
        ))

)]

layout_xy = go.Layout(
    title='Where Did the Songs Originate?',
    geo=dict(
        scope="world",
        showframe=False,
        showcoastlines=False,
        showland=True,
        landcolor="rgb(150, 150, 150)",
        countrycolor="rgb(255, 255, 255)",
        coastlinecolor="rgb(255, 255, 255)",

    )
)

fig_xy = go.Figure(data=data_xy, layout=layout_xy)
pyoff.iplot(fig_xy)
In [13]:
genre_year = df[['genre_tags','year']].dropna()
In [14]:
genre_year['year']= genre_year.year.astype(int)
In [15]:
len(genre_year.genre_tags.unique())
Out[15]:
609

Insight 4

The bar graph presents the 20 most common genres from 1920s-2010. It can be observed that classic pop and rock and uk have the greatest number of songs. Also, notice that people usually create rock songs and only differ whether it is classic pop and rock, rock and indie,rock, or alternative rock. This graph shows how influential rock songs are throughout the decades.

In [16]:
#Top 20 genres
genre_year.genre_tags.value_counts().nlargest(20).plot(kind='barh',figsize=(10,10))
plt.xlabel('Count of Songs')
plt.ylabel('Genre')
plt.title('Top 20 Genres of All Time')
Out[16]:
Text(0.5,1,'Top 20 Genres of All Time')
In [17]:
GENRES2 = ['classic pop and rock', 'uk', 'rock and indie', 'folk', 'british', 'american', 'hip hop mb and dance hall',
           'punk', 'german', 'french', 'rock', 'finnish', 'country', 'jazz and blues', 
           'pop and chart','jazz','alternative rock','production music','dance and electronica', 'soul and reggae']

gen2 = dict()
i = 0
for each in GENRES2:
    gen2[each] = i
    i += 1

pipeline2 = Pipeline([('cc', CountVectorizer(vocabulary=gen2))])
In [18]:
df2 =genre_year
In [19]:
gen_data2 = dict()
gen_data2['1920'] = pipeline2.fit_transform(df2[(df2['year'] >= 1920.0) & (
    df2['year'] < 1930.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1930'] = pipeline2.fit_transform(df2[(df2['year'] >= 1930.0) & (
    df2['year'] < 1940.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1940'] = pipeline2.fit_transform(df2[(df2['year'] >= 1940.0) & (
    df2['year'] < 1950.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1950'] = pipeline2.fit_transform(df2[(df2['year'] >= 1950.0) & (
    df2['year'] < 1960.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1960'] = pipeline2.fit_transform(df2[(df2['year'] >= 1960.0) & (
    df2['year'] < 1970.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1970'] = pipeline2.fit_transform(df2[(df2['year'] >= 1970.0) & (
    df2['year'] < 1980.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1980'] = pipeline2.fit_transform(df2[(df2['year'] >= 1980.0) & (
    df2['year'] < 1990.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['1990'] = pipeline2.fit_transform(df2[(df2['year'] >= 1990.0) & (
    df2['year'] < 2000.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['2000'] = pipeline2.fit_transform(df2[(df2['year'] >= 2000.0) & (
    df2['year'] < 2010.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
gen_data2['2010'] = pipeline2.fit_transform(df2[(df2['year'] >= 2010.0) & (
    df2['year'] < 2020.0)]['genre_tags'].dropna()).toarray().sum(axis=0)
In [20]:
df3 = pd.DataFrame.from_dict(gen_data2)
df3.index=GENRES2
In [21]:
df3.to_csv('genre_year.csv')
In [22]:
df3= pd.read_csv('genre_year.csv')
In [23]:
df3.columns = ['Genre','1920','1930','1940','1950','1960','1970','1980','1990','2000','2010']
df3 = df3.set_index('Genre')

Insight 5

When segmented by decade, songs usually come from the 2000s. The trend of song counts decreases as decade becomes earlier. Artists consistently create classic pop and rock songs over time as opposed to the common notion that 2000s has been a pop generation.

In [24]:
data_plot = [
    go.Scatter(
        x=df3.T.index,
        y=df3.T[each],
        name = each
    ) for each in df3.T.columns
]

layout = go.Layout(
    title = 'What Song Genres Do People Listen Over Time?',
    yaxis = dict( title = 'Count of Songs')
)
fig = go.Figure(data=data_plot, layout=layout)
pyoff.iplot(fig)

Insight 6

Among all the features including artist_hotttnesss, duration, key, loudness, tempo and year, artist_hotttnesss correlate the most to song_hotttnesss. It is quite intuitive to say that a song from a well-known artist usually receives a higher song rating as compared to a song from a newbie artist.

In [25]:
#How is Song hotness related to other features
import seaborn as sns
corr = (df[['song_hotttnesss', 'artist_hotttnesss','duration','key','loudness','tempo','year']].dropna()).corr()
In [26]:
sns.heatmap(corr, robust=True,linewidths=0.2,cmap='viridis').set_title('How Do Other Features Correlate to Song Hotness?')
Out[26]:
Text(0.5,1,'How Do Other Features Correlate to Song Hotness?')
In [27]:
#Hottest Songs of All Time
hottest_songs = df[['song_hotttnesss','title']].dropna()
hottest_songs = hottest_songs.drop_duplicates()
In [28]:
top_songs = hottest_songs.sort_values(by=['song_hotttnesss'], ascending=False)
top_songs = top_songs.set_index('title')

Insight 7

The bar graph below shows the 50 hottest songs of all time. Values of song_hotttnesss range from 0-1, and it can be observed that only three songs got a perfect rating of 1 such as White Room, When A Man Loves A Woman and Bitter Sweet Symphony.

In [29]:
top_songs.head(50).plot(kind='barh',figsize=(20,20))
plt.xlabel('Song Hotness')
plt.ylabel('Song Title')
plt.title('Hottest Songs of All Time')
Out[29]:
Text(0.5,1,'Hottest Songs of All Time')
In [30]:
#How do people identify hot songs?
avg = df.groupby('title')[['tempo','artist_hotttnesss','loudness','duration']].mean().dropna()
In [31]:
htf = df[['song_hotttnesss','artist_hotttnesss','title']].groupby(
    'title')[['song_hotttnesss', 'artist_hotttnesss']].max().dropna()
In [32]:
q1 = htf[(htf['artist_hotttnesss'] > htf['artist_hotttnesss'].mean()) & (
    htf['song_hotttnesss'] > htf['song_hotttnesss'].mean())]

q2 = htf[(htf['artist_hotttnesss'] <= htf['artist_hotttnesss'].mean()) & (
    htf['song_hotttnesss'] > htf['song_hotttnesss'].mean())]

q3 = htf[(htf['artist_hotttnesss'] <= htf['artist_hotttnesss'].mean()) & (
    htf['song_hotttnesss'] <= htf['song_hotttnesss'].mean())]

q4 = htf[(htf['artist_hotttnesss'] > htf['artist_hotttnesss'].mean()) & (
    htf['song_hotttnesss'] <= htf['song_hotttnesss'].mean())]
In [33]:
avg1 = avg.loc[q1.index,:].mean()
avg2 = avg.loc[q2.index,:].mean()
avg3 = avg.loc[q3.index,:].mean()
avg4 = avg.loc[q4.index,:].mean()
In [34]:
avgs = np.array([avg1,avg2,avg3,avg4])
In [35]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
avgs_norm = scaler.fit_transform(avgs)

Insight 8

The song statistics provide information on what makes a song a hit or a flop. Classes are divided into four, namely: Classic Hit Songs, New Hit Releases, F-list Songs and Bad New Releases. Clsssic Hit Songs have a wide range of loudness, tempo and artist_hotttnesss while having duration. New Hit Releases follow the Classic Hit Songs' formula, having songs that are louder and have faster beat while being sang by newbie artists. F-list Songs typically have high duration and lowtempo. On the other hand, Bad New Releases comprised of songs from known artists that have high duration which in turn got a low song rating.

In [36]:
layout_avgs = go.Layout(
    title='Song Statistics',
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',)

data1 = [go.Scatterpolar(r=avgs_norm[0], theta=['Tempo','Artist Hotness','Loudness','Duration'], fill='toself', name='Classic Hit Songs'), 
         go.Scatterpolar(r=avgs_norm[1], theta=['Tempo','Artist Hotness','Loudness','Duration'], fill='toself', name='New Hit Releases'),
        go.Scatterpolar(r=avgs_norm[2], theta=['Tempo','Artist Hotness','Loudness','Duration'], fill='toself', name='F-list Songs'),
        go.Scatterpolar(r=avgs_norm[3], theta=['Tempo','Artist Hotness','Loudness','Duration'], fill='toself', name='Bad New Releases')]
fig_grp=go.Figure(data=data1, layout=layout_avgs)
pyoff.iplot(fig_grp)
In [38]:
tempo_yr = df[['tempo','year']]
tempo_yr=tempo_yr.dropna()
fin_tempo_yr= tempo_yr[tempo_yr['year'] != 0.0]
def decade(i):
    if 1920.0 <=i <1930.0:
        return '1920'
    if 1930.0 <=i < 1940.0:
        return '1930'
    if 1940.0 <=i <1950.0:
        return '1940'
    if 1950.0 <=i < 1960.0:
        return '1950'
    if 1960.0 <=i <1970.0:
        return '1960'
    if 1970.0 <=i < 1980.0:
        return '1970'
    if 1980.0 <=i <1990.0:
        return '1980'
    if 1990.0 <=i < 2000.0:
        return '1990'
    if 2000.0 <=i < 2010.0:
        return '2000'
    else:
        return '2010'
fin_tempo_yr['decade'] = fin_tempo_yr['year'].apply (lambda i: decade (i))

Insight 9

The graph below illustrates an increasing trend of tempo over time. It can be observed that from the late 90s to 2000s that the music industry continually evolves, offering a wide range of music not limited to a certain tempo bracket.

In [39]:
from matplotlib import pyplot
import seaborn as sns
fig, ax = pyplot.subplots(figsize=(20,10))
sns.boxplot(fin_tempo_yr['decade'], fin_tempo_yr['tempo'])
ax.set_title('How Does Tempo Change Over Time?')
Out[39]:
Text(0.5,1,'How Does Tempo Change Over Time?')

Here's another illustration of how tempo changes overtime.

In [40]:
fig,ax = pyplot.subplots(figsize=(20,10))
sns.regplot(fin_tempo_yr['year'],fin_tempo_yr['tempo'],marker='+',fit_reg=False)
ax.set_title("How does Tempo change overtime?")
Out[40]:
Text(0.5,1,'How does Tempo change overtime?')

Building the Recommender System

Model Assumptions and Limitations

  1. Only English-titled tracks are considered during our testing (recommending process).
  2. Large contrasts in genre are omitted to improve model accuracy (this was done heauristically during recommendation as the genre feature in this dataset is unreliable).
  3. Both recommender systems will produce unique set of songs to maximize listener choices.

Using Cosine Similarity

Recommending content involves making a prediction about how likely it is that a user is going to like a recommended content [1]. The type of recommender system used in this notebook is a content-based recommender system. These types of recommender systems rely on features extracted or inherently present in the items which you would like to recommend. There are numerous methods to implement content-based recommenders, the simplest of which is using similarity based metrics and in this case, Cosine similarity.

To be able to apply non-Spark functions across a Spark Dataframe requires you to instantiate your function as a UDF (User-defined function). It's also worthwhile to remember that because Spark is more akin to Java programming than Python in itself, most functions that involve data structures or return values require a return type, and there are several primitive and non-primitive types to choose from in the pyspark.sql functions library.

With user defined functions in mind, we can now identify the Cosine distance (or Similarity) of a particular song from another decade, to an entire decade of songs with the simple formula below:

\begin{equation} Similarity = Cos(\theta) = \frac{A \cdot B}{||A||\hspace{1mm}||B||} = \frac{\sum_{i=1}^{n} A_iB_i}{\sqrt{\sum_{i=1}^{n}A^2_i}\sqrt{\sum_{i=1}^{n}B^2_i}} \end{equation}

This isn't too hard to manually implement but there is already an available function for use under the Scipy library which makes implementing this all the more convenient.

Two functions were created, find_song and get_similar. The find_song function's purpose is as its name implies. It returns a Spark Dataframe which can be used in get similar. get_similar takes in a Spark Dataframe, and a decade of your choice from the views in section 5.1. What this function does is that it takes every item in the Spark Dataframe and computes the distance of each item every item in the decade of choice. It returns another Spark Datafraame ordered by most similar.

Citations: [1] https://www.offerzen.com/blog/how-to-build-a-content-based-recommender-system-for-your-product

In [147]:
import pandas as pd
reduced = pd.read_parquet('100ksongs_LT10.parquet.gzip')
In [2]:
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql import functions as psf
sc = SparkContext('local[4]')
session = SparkSession(sc)
sqlContext = SQLContext(sc)
spark_df = sqlContext.createDataFrame(reduced)

Aggregating the features into a single column and filtering by decade

In [148]:
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler

assembler = VectorAssembler(
  inputCols = ['duration',
 'key',
 'loudness',
 'tempo',
 'time_signature'],
  outputCol = "feature_col")

spark_df_assem = assembler.transform(spark_df)
spark_df_assem.show(5)
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+----------------+----------------+-------------+--------------------+
|danceability| duration|energy|key|loudness|  tempo|time_signature|               title|         artist_name|           song_id|year|duration_buckets|loudness_buckets|tempo_buckets|         feature_col|
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+----------------+----------------+-------------+--------------------+
|         0.0|293.45914|   0.0|  1|  -7.551|127.543|             3|I Loved You In Me...|Kirk Whalum (Feat...|SOBZRQP12A6701F181|2003|            13.0|             7.0|          6.0|[293.45914,1.0,-7...|
|         0.0|360.75057|   0.0|  1| -13.424|125.002|             4|A New Soul Full o...|               Pheek|SOYTQLG12AB0186901|   0|            13.0|             6.0|          6.0|[360.75057,1.0,-1...|
|         0.0|  161.802|   0.0| 10|  -4.305|138.897|             1|Turtles All The W...|    Every Time I Die|SOGMJCX12AB0187792|2009|            12.0|             8.0|          6.0|[161.802,10.0,-4....|
|         0.0| 213.9424|   0.0|  7|  -9.008| 94.973|             1|Tira Ela De Mim (...|     Alexandre Pires|SOEUDKR12AB0181F03|2005|            12.0|             7.0|          5.0|[213.9424,7.0,-9....|
|         0.0|183.71873|   0.0|  9| -13.362| 170.94|             4|Are You Receiving...|                 XTC|SOLSUZT12A6D4F65EA|1978|            12.0|             6.0|          7.0|[183.71873,9.0,-1...|
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+----------------+----------------+-------------+--------------------+
only showing top 5 rows

In [149]:
spark_df_assem.createOrReplaceTempView('spark_df')
In [150]:
dec_2k = session.sql('SELECT * FROM spark_df WHERE year LIKE "200%"')
dec_2k10 = session.sql('SELECT * FROM spark_df WHERE year LIKE "201%"')
dec_90s = session.sql('SELECT * FROM spark_df WHERE year LIKE "199%"')
dec_80s = session.sql('SELECT * FROM spark_df WHERE year LIKE "198%"')
dec_70s = session.sql('SELECT * FROM spark_df WHERE year LIKE "197%"')
dec_60s = session.sql('SELECT * FROM spark_df WHERE year LIKE "196%"')
dec_50s = session.sql('SELECT * FROM spark_df WHERE year LIKE "195%"')
dec_40s = session.sql('SELECT * FROM spark_df WHERE year LIKE "194%"')
dec_30s = session.sql('SELECT * FROM spark_df WHERE year LIKE "193%"')
dec_20s = session.sql('SELECT * FROM spark_df WHERE year LIKE "192%"')

Creating modular functions (similarity based recommender)

In [151]:
from pyspark.sql.functions import udf
from pyspark.sql.types import FloatType
from pyspark.sql.functions import desc
from scipy import spatial
cossim_udf = udf(lambda x,y: float(1-spatial.distance.cosine(x,y)),returnType=FloatType())

Decade_views

In [152]:
dec_2k10.createOrReplaceTempView('dec_2k10')
dec_2k.createOrReplaceTempView('dec_2k')
dec_90s.createOrReplaceTempView('dec_90s')
dec_80s.createOrReplaceTempView('dec_80s')
dec_70s.createOrReplaceTempView('dec_70s')
dec_60s.createOrReplaceTempView('dec_60s')
dec_50s.createOrReplaceTempView("dec_50s")
dec_40s.createOrReplaceTempView('dec_40s')
dec_30s.createOrReplaceTempView('dec_30s')
dec_20s.createOrReplaceTempView('dec_20s')

song finder function

In [153]:
def find_song(song_keyword=None,artist_keyword=None):
    #view is spark_df
    to_append_song_keyword = "'%" +str(song_keyword) + "%'"
    to_append_artist_keyword = "'%" +str(artist_keyword) + "%'"
    
    if song_keyword==None:
        string_pattern = 'SELECT * FROM spark_df WHERE artist_name LIKE '
        final = string_pattern+to_append_artist_keyword
    elif artist_keyword==None:
        string_pattern = 'SELECT * FROM spark_df WHERE title LIKE '
        final = string_pattern+to_append_song_keyword
    else:
        string_pattern = 'SELECT * FROM spark_df WHERE title LIKE '
        combi = ' AND '
        string_pattern2 = 'artist_name LIKE '
        final = string_pattern+to_append_song_keyword+combi+string_pattern2+to_append_artist_keyword
    print(final)
        
        
    return session.sql(final)
    
In [154]:
query = find_song('Gravity','Sara')
SELECT * FROM spark_df WHERE title LIKE '%Gravity%' AND artist_name LIKE '%Sara%'
In [155]:
query.show()
+------------+---------+------+---+--------+------+--------------+-------+--------------+------------------+----+----------------+----------------+-------------+--------------------+
|danceability| duration|energy|key|loudness| tempo|time_signature|  title|   artist_name|           song_id|year|duration_buckets|loudness_buckets|tempo_buckets|         feature_col|
+------------+---------+------+---+--------+------+--------------+-------+--------------+------------------+----+----------------+----------------+-------------+--------------------+
|         0.0|233.37751|   0.0|  0| -10.666|86.623|             4|Gravity|Sara Bareilles|SONZPPA12AF72A9E13|2004|            12.0|             6.0|          4.0|[233.37751,0.0,-1...|
+------------+---------+------+---+--------+------+--------------+-------+--------------+------------------+----+----------------+----------------+-------------+--------------------+

recommender function

In [156]:
def get_similar(query,target_decade):
    query.createOrReplaceTempView('q')
    string_pattern = 'SELECT distinct_T1.title as title1, distinct_T1.artist_name as artist_name1\
                            , distinct_T1.feature_col as feature_col1, \
                          distinct_T2.title, distinct_T2.artist_name, distinct_T2.feature_col FROM '
    string2 = 'q AS distinct_T1 CROSS JOIN '
    
    string4 = ' AS distinct_T2'
    final = string_pattern+string2+target_decade+string4
    print(final)
    new_table = session.sql(final)
    new_table = new_table.withColumn('Similarity',cossim_udf('feature_col1','feature_col'))
    return new_table.sort(desc('title1'),desc('Similarity'))
    #search which decade the song comes from first
In [157]:
get_similar(query,'dec_90s').show(10)
SELECT distinct_T1.title as title1, distinct_T1.artist_name as artist_name1                            , distinct_T1.feature_col as feature_col1,                           distinct_T2.title, distinct_T2.artist_name, distinct_T2.feature_col FROM q AS distinct_T1 CROSS JOIN dec_90s AS distinct_T2
+-------+--------------+--------------------+--------------------+--------------------+--------------------+----------+
| title1|  artist_name1|        feature_col1|               title|         artist_name|         feature_col|Similarity|
+-------+--------------+--------------------+--------------------+--------------------+--------------------+----------+
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|Bambolina e barra...|             Ligabue|[315.71546,0.0,-1...| 0.9999867|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|I Can Only Give Y...|      Deville_ Willy|[296.48934,0.0,-1...| 0.9999856|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|               Stain|        Dead Man Ray|[191.21587,0.0,-8...|0.99998504|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|   Executioner Style|          Kool G Rap|[246.72608,0.0,-1...| 0.9999846|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|      Glocky Bit Ext|         Microstoria|[338.99057,1.0,-1...| 0.9999781|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|               Thing|       Freaky Chakra|[358.32118,0.0,-1...| 0.9999761|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|             Giggles|Fugees (Tranzlato...|[261.09342,1.0,-1...|0.99996823|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|    Llevas Mi Nombre|         Elsa Garcia|[243.19955,2.0,-1...| 0.9999661|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|           Soft Talk|        Shelby Lynne|[221.70077,1.0,-1...|  0.999965|
|Gravity|Sara Bareilles|[233.37751,0.0,-1...|           Soft Talk|        Shelby Lynne|[221.70077,1.0,-1...|  0.999965|
+-------+--------------+--------------------+--------------------+--------------------+--------------------+----------+
only showing top 10 rows

Jaccard Similarity

Load reduced dataset with 100k songs

In [1]:
import numpy as np
import pandas as pd

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
reduced = pd.read_parquet('100ksongs_LT10.parquet.gzip')
In [38]:
reduced.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100000 entries, 0 to 0
Data columns (total 11 columns):
danceability      100000 non-null float64
duration          100000 non-null float64
energy            100000 non-null float64
key               100000 non-null int32
loudness          100000 non-null float64
tempo             100000 non-null float64
time_signature    100000 non-null int32
title             100000 non-null object
artist_name       100000 non-null object
song_id           100000 non-null object
year              100000 non-null int32
dtypes: float64(5), int32(3), object(3)
memory usage: 8.0+ MB
In [39]:
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql import functions as psf
In [4]:
sc = SparkContext('local[4]')
session = SparkSession(sc)
In [40]:
sqlContext = SQLContext(sc)
spark_df = sqlContext.createDataFrame(reduced)
In [41]:
spark_df.show(5)
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+
|danceability| duration|energy|key|loudness|  tempo|time_signature|               title|         artist_name|           song_id|year|
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+
|         0.0|293.45914|   0.0|  1|  -7.551|127.543|             3|I Loved You In Me...|Kirk Whalum (Feat...|SOBZRQP12A6701F181|2003|
|         0.0|360.75057|   0.0|  1| -13.424|125.002|             4|A New Soul Full o...|               Pheek|SOYTQLG12AB0186901|   0|
|         0.0|  161.802|   0.0| 10|  -4.305|138.897|             1|Turtles All The W...|    Every Time I Die|SOGMJCX12AB0187792|2009|
|         0.0| 213.9424|   0.0|  7|  -9.008| 94.973|             1|Tira Ela De Mim (...|     Alexandre Pires|SOEUDKR12AB0181F03|2005|
|         0.0|183.71873|   0.0|  9| -13.362| 170.94|             4|Are You Receiving...|                 XTC|SOLSUZT12A6D4F65EA|1978|
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+
only showing top 5 rows

In [42]:
spark_df.select([psf.count(psf.when(psf.isnan(c), c)).alias(c) for c in spark_df.columns]).show()
+------------+--------+------+---+--------+-----+--------------+-----+-----------+-------+----+
|danceability|duration|energy|key|loudness|tempo|time_signature|title|artist_name|song_id|year|
+------------+--------+------+---+--------+-----+--------------+-----+-----------+-------+----+
|           0|       0|     0|  0|       0|    0|             0|    0|          0|      0|   0|
+------------+--------+------+---+--------+-----+--------------+-----+-----------+-------+----+

In [43]:
spark_df.select('danceability',
                'duration',
                'energy',
                'key',
                'loudness',
                'tempo',
                'time_signature').summary().show()
+-------+------------+------------------+------+-----------------+-------------------+------------------+------------------+
|summary|danceability|          duration|energy|              key|           loudness|             tempo|    time_signature|
+-------+------------+------------------+------+-----------------+-------------------+------------------+------------------+
|  count|      100000|            100000|100000|           100000|             100000|            100000|            100000|
|   mean|         0.0|249.96655455720014|   0.0|          5.30099|-10.120641150000004|123.96112129000008|           3.59116|
| stddev|         0.0| 127.3856987731741|   0.0|3.606006801539701|  5.207862289415197| 35.09329712003566|1.2225648453345987|
|    min|         0.0|           0.46975|   0.0|                0|            -56.145|               0.0|                 0|
|    25%|         0.0|         180.37506|   0.0|                2|            -12.662|            98.055|                 3|
|    50%|         0.0|         228.98893|   0.0|                5|             -8.949|            122.24|                 4|
|    75%|         0.0|          290.2722|   0.0|                9|             -6.383|            144.02|                 4|
|    max|         0.0|        3033.59955|   0.0|               11|              4.045|           296.469|                 7|
+-------+------------+------------------+------+-----------------+-------------------+------------------+------------------+

Data pre-processing: Data Binning and transformation to categorical types

For the duration, loudness and tempo values, we performed data binning to replace the continuous value with a representative of the interval it belongs to. For each field, we used the standard deviation as the bin size. The endpoints of the intervals were computed by adding and subtracting multiples of standard deviation to mean value, such that all data points were represented. Also, this approach retains the shape of the distribution. We used Pysparsk Bucketizer to automatically assign the representative interval, which converts the field into a categorical type. As for the key and time_signature, they were immediately transformed into categorical values since they integer types already. All of these fields were transformed into categorical types because they will be used as features for Jaccard Index Similarity.

In [56]:
import numpy as np
fields = ['duration', 'loudness', 'tempo']
stats = ['mean', 'std', 'max', 'min', 'range']
stats_dict = {}

for c in fields:
    keyname = c + '_mean'
    stats_dict[keyname] = np.round(spark_df.select(psf.mean(psf.col(c)).alias('stat')).collect()[0]['stat'], 3)

for c in fields:
    keyname = c + '_std'
    stats_dict[keyname] = np.round(spark_df.select(psf.stddev(psf.col(c)).alias('stat')).collect()[0]['stat'], 3)

for c in fields:
    keyname = c + '_max'
    stats_dict[keyname] = spark_df.select(psf.max(psf.col(c)).alias('stat')).collect()[0]['stat']

for c in fields:
    keyname = c + '_min'
    stats_dict[keyname] = spark_df.select(psf.min(psf.col(c)).alias('stat')).collect()[0]['stat']

for c in fields:
    keyname = c + '_range'
    stats_dict[keyname] = np.round(stats_dict[(c + '_max')] - stats_dict[(c + '_min')], 3)

stats_dict
Out[56]:
{'duration_mean': 249.967,
 'loudness_mean': -10.121,
 'tempo_mean': 123.961,
 'duration_std': 127.386,
 'loudness_std': 5.208,
 'tempo_std': 35.093,
 'duration_max': 3033.59955,
 'loudness_max': 4.045,
 'tempo_max': 296.469,
 'duration_min': 0.46975,
 'loudness_min': -56.145,
 'tempo_min': 0.0,
 'duration_range': 3033.13,
 'loudness_range': 60.19,
 'tempo_range': 296.469}
In [57]:
from pyspark.ml.feature import Bucketizer
In [58]:
splits_dict = {}
for colname in fields:
    _splits = []
    _splits.append(-1 * float('Inf'))
    meanval = stats_dict[colname + '_mean']
    stdval = stats_dict[colname + '_std']
    n = int(np.ceil(stats_dict[colname + '_range'] / stdval))
    if n % 2 != 0: # if odd number
        n += 1
    minval = meanval - (np.round(n/2)) * stdval
    for i in range(n):
        _splits.append(float(minval + i*stdval))
    _splits.append(float('Inf'))
    splits_dict[colname] = _splits
    
    bucketizer = Bucketizer(splits=_splits, inputCol=colname,
                            outputCol=(colname+"_buckets"), )
    spark_df = bucketizer.setHandleInvalid("keep").transform(spark_df)
In [59]:
splits_dict
Out[59]:
{'duration': [-inf,
  -1278.665,
  -1151.279,
  -1023.893,
  -896.507,
  -769.121,
  -641.735,
  -514.3489999999999,
  -386.96299999999997,
  -259.577,
  -132.19100000000003,
  -4.805000000000064,
  122.5809999999999,
  249.9670000000001,
  377.35300000000007,
  504.73900000000003,
  632.125,
  759.511,
  886.8969999999999,
  1014.2829999999999,
  1141.6689999999999,
  1269.0549999999998,
  1396.4409999999998,
  1523.8269999999998,
  1651.2129999999997,
  inf],
 'loudness': [-inf,
  -41.369,
  -36.161,
  -30.953,
  -25.744999999999997,
  -20.537,
  -15.329,
  -10.120999999999999,
  -4.912999999999997,
  0.2950000000000017,
  5.503,
  10.710999999999999,
  15.919000000000004,
  inf],
 'tempo': [-inf,
  -51.50400000000003,
  -16.41100000000003,
  18.681999999999974,
  53.77499999999998,
  88.86799999999998,
  123.961,
  159.05399999999997,
  194.147,
  229.24,
  264.333,
  inf]}
In [60]:
spark_df.show()
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+----------------+----------------+-------------+
|danceability| duration|energy|key|loudness|  tempo|time_signature|               title|         artist_name|           song_id|year|duration_buckets|loudness_buckets|tempo_buckets|
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+----------------+----------------+-------------+
|         0.0|293.45914|   0.0|  1|  -7.551|127.543|             3|I Loved You In Me...|Kirk Whalum (Feat...|SOBZRQP12A6701F181|2003|            13.0|             7.0|          6.0|
|         0.0|360.75057|   0.0|  1| -13.424|125.002|             4|A New Soul Full o...|               Pheek|SOYTQLG12AB0186901|   0|            13.0|             6.0|          6.0|
|         0.0|  161.802|   0.0| 10|  -4.305|138.897|             1|Turtles All The W...|    Every Time I Die|SOGMJCX12AB0187792|2009|            12.0|             8.0|          6.0|
|         0.0| 213.9424|   0.0|  7|  -9.008| 94.973|             1|Tira Ela De Mim (...|     Alexandre Pires|SOEUDKR12AB0181F03|2005|            12.0|             7.0|          5.0|
|         0.0|183.71873|   0.0|  9| -13.362| 170.94|             4|Are You Receiving...|                 XTC|SOLSUZT12A6D4F65EA|1978|            12.0|             6.0|          7.0|
|         0.0|245.05424|   0.0|  0|  -8.264|164.562|             1|         Sea Of Lies|          Symphony X|SOZFIUX12AB018AF20|1996|            12.0|             7.0|          7.0|
|         0.0|177.26649|   0.0|  1| -25.566| 126.44|             4|    Broke And Hungry|Blind Lemon Jeffe...|SOFHNPM12A8C1412AA|1927|            12.0|             4.0|          6.0|
|         0.0|193.33179|   0.0|  4|  -6.867|168.915|             4|     Soft Revolution|               Stars|SOYDQFR12AB017B27D|2004|            12.0|             7.0|          7.0|
|         0.0|278.67383|   0.0| 10|  -9.101|135.046|             4|   Stjärnornas Musik|    Mr Jones Machine|SOALFZR12AB018B1E3|2007|            13.0|             7.0|          6.0|
|         0.0|446.24934|   0.0|  2| -10.542|144.053|             4|Escape From Tulse...|                 OTT|SOVCTEX12A8C145841|2003|            14.0|             6.0|          6.0|
|         0.0|197.58975|   0.0|  8|  -3.763|177.793|             3|                 Asi|          Juaninacka|SOLYBLS12AB0189533|2004|            12.0|             8.0|          7.0|
|         0.0|195.47383|   0.0|  8|  -8.911|114.519|             3|       The Good Life|       Belinda Bruce|SOSFGZF12AB0182DC5|   0|            12.0|             7.0|          5.0|
|         0.0|260.25751|   0.0|  0|  -9.922|117.734|             5|          Clearlight|          Cordrazine|SOJQIZB12A8C139B47|1997|            13.0|             7.0|          5.0|
|         0.0|233.40363|   0.0|  0| -10.897| 95.978|             4|  Sidestick Eyepoker|Overwhelming Colo...|SOHKDRH12AB018481E|1994|            12.0|             6.0|          5.0|
|         0.0|  297.482|   0.0|  9|  -5.423| 95.598|             4|Nothing On Earth ...|           Jeff Deyo|SODQMPH12A8C1393B7|2007|            13.0|             7.0|          5.0|
|         0.0|238.88934|   0.0|  6|  -8.393|117.172|             4|    The Taliban Song|          Toby Keith|SOPUUDK12A6D4F9022|   0|            12.0|             7.0|          5.0|
|         0.0|174.75873|   0.0|  6|  -7.559|199.985|             4|Goodbye_ Freaks [...|           Kurt Vile|SOWHHGC12AC3DFB63F|2009|            12.0|             7.0|          8.0|
|         0.0|224.13016|   0.0|  9|  -3.961|133.007|             4|     Miles and Miles|The Frank And Wal...|SOYLKGY12A8AE4674E|   0|            12.0|             8.0|          6.0|
|         0.0|217.36444|   0.0|  9|  -7.708|100.114|             1|    Please Tie Me Up|The Impossible Sh...|SOLUMUI12A8C140212|   0|            12.0|             7.0|          5.0|
|         0.0|223.97342|   0.0| 11|  -5.458|113.897|             4|   Love Whip (Album)|Reverend Horton Heat|SOSMLKY12A8C134655|1991|            12.0|             7.0|          5.0|
+------------+---------+------+---+--------+-------+--------------+--------------------+--------------------+------------------+----+----------------+----------------+-------------+
only showing top 20 rows

In [61]:
spark_df.select('duration', 'duration_buckets', 'loudness', 'loudness_buckets', 'tempo', 'tempo_buckets').show()
+---------+----------------+--------+----------------+-------+-------------+
| duration|duration_buckets|loudness|loudness_buckets|  tempo|tempo_buckets|
+---------+----------------+--------+----------------+-------+-------------+
|293.45914|            13.0|  -7.551|             7.0|127.543|          6.0|
|360.75057|            13.0| -13.424|             6.0|125.002|          6.0|
|  161.802|            12.0|  -4.305|             8.0|138.897|          6.0|
| 213.9424|            12.0|  -9.008|             7.0| 94.973|          5.0|
|183.71873|            12.0| -13.362|             6.0| 170.94|          7.0|
|245.05424|            12.0|  -8.264|             7.0|164.562|          7.0|
|177.26649|            12.0| -25.566|             4.0| 126.44|          6.0|
|193.33179|            12.0|  -6.867|             7.0|168.915|          7.0|
|278.67383|            13.0|  -9.101|             7.0|135.046|          6.0|
|446.24934|            14.0| -10.542|             6.0|144.053|          6.0|
|197.58975|            12.0|  -3.763|             8.0|177.793|          7.0|
|195.47383|            12.0|  -8.911|             7.0|114.519|          5.0|
|260.25751|            13.0|  -9.922|             7.0|117.734|          5.0|
|233.40363|            12.0| -10.897|             6.0| 95.978|          5.0|
|  297.482|            13.0|  -5.423|             7.0| 95.598|          5.0|
|238.88934|            12.0|  -8.393|             7.0|117.172|          5.0|
|174.75873|            12.0|  -7.559|             7.0|199.985|          8.0|
|224.13016|            12.0|  -3.961|             8.0|133.007|          6.0|
|217.36444|            12.0|  -7.708|             7.0|100.114|          5.0|
|223.97342|            12.0|  -5.458|             7.0|113.897|          5.0|
+---------+----------------+--------+----------------+-------+-------------+
only showing top 20 rows

In [62]:
spark_df.select('duration', 'duration_buckets', 'loudness', 'loudness_buckets', 'tempo', 'tempo_buckets').show()
+---------+----------------+--------+----------------+-------+-------------+
| duration|duration_buckets|loudness|loudness_buckets|  tempo|tempo_buckets|
+---------+----------------+--------+----------------+-------+-------------+
|293.45914|            13.0|  -7.551|             7.0|127.543|          6.0|
|360.75057|            13.0| -13.424|             6.0|125.002|          6.0|
|  161.802|            12.0|  -4.305|             8.0|138.897|          6.0|
| 213.9424|            12.0|  -9.008|             7.0| 94.973|          5.0|
|183.71873|            12.0| -13.362|             6.0| 170.94|          7.0|
|245.05424|            12.0|  -8.264|             7.0|164.562|          7.0|
|177.26649|            12.0| -25.566|             4.0| 126.44|          6.0|
|193.33179|            12.0|  -6.867|             7.0|168.915|          7.0|
|278.67383|            13.0|  -9.101|             7.0|135.046|          6.0|
|446.24934|            14.0| -10.542|             6.0|144.053|          6.0|
|197.58975|            12.0|  -3.763|             8.0|177.793|          7.0|
|195.47383|            12.0|  -8.911|             7.0|114.519|          5.0|
|260.25751|            13.0|  -9.922|             7.0|117.734|          5.0|
|233.40363|            12.0| -10.897|             6.0| 95.978|          5.0|
|  297.482|            13.0|  -5.423|             7.0| 95.598|          5.0|
|238.88934|            12.0|  -8.393|             7.0|117.172|          5.0|
|174.75873|            12.0|  -7.559|             7.0|199.985|          8.0|
|224.13016|            12.0|  -3.961|             8.0|133.007|          6.0|
|217.36444|            12.0|  -7.708|             7.0|100.114|          5.0|
|223.97342|            12.0|  -5.458|             7.0|113.897|          5.0|
+---------+----------------+--------+----------------+-------+-------------+
only showing top 20 rows

In [64]:
import matplotlib.pyplot as plt
bins, counts = spark_df.select("duration").rdd.flatMap(lambda x: x).histogram(100)
plt.hist(bins[:-1], bins=bins, weights=counts);
In [65]:
colname = 'duration_buckets'
keys_df = spark_df.groupBy(colname).count().orderBy('count', ascending=False)
keys_arr = [int(row[colname]) for row in keys_df.collect()]
keys_count = [int(row['count']) for row in keys_df.collect()]

plt.bar(keys_arr, keys_count)
Out[65]:
<BarContainer object of 14 artists>
In [66]:
bins, counts = spark_df.select("loudness").rdd.flatMap(lambda x: x).histogram(100)
plt.hist(bins[:-1], bins=bins, weights=counts);
In [67]:
colname = 'loudness_buckets'
keys_df = spark_df.groupBy(colname).count().orderBy('count', ascending=False)
keys_arr = [int(row[colname]) for row in keys_df.collect()]
keys_count = [int(row['count']) for row in keys_df.collect()]

plt.bar(keys_arr, keys_count)
Out[67]:
<BarContainer object of 10 artists>
In [68]:
bins, counts = spark_df.select("tempo").rdd.flatMap(lambda x: x).histogram(20)
plt.hist(bins[:-1], bins=bins, weights=counts);
In [69]:
colname = 'tempo_buckets'
keys_df = spark_df.groupBy(colname).count().orderBy('count', ascending=False)
keys_arr = [int(row[colname]) for row in keys_df.collect()]
keys_count = [int(row['count']) for row in keys_df.collect()]

plt.bar(keys_arr, keys_count)
Out[69]:
<BarContainer object of 9 artists>
In [70]:
spark_df.select('year').summary().show()
+-------+-----------------+
|summary|             year|
+-------+-----------------+
|  count|           100000|
|   mean|        1031.2216|
| stddev|998.7301681500008|
|    min|                0|
|    25%|                0|
|    50%|             1970|
|    75%|             2002|
|    max|             2011|
+-------+-----------------+

In [71]:
spark_df.groupBy('year').count().orderBy('count', ascending=False).show(20)
+----+-----+
|year|count|
+----+-----+
|   0|48398|
|2007| 3998|
|2006| 3742|
|2005| 3578|
|2008| 3527|
|2009| 3130|
|2004| 3009|
|2003| 2738|
|2002| 2343|
|2001| 2066|
|2000| 1897|
|1999| 1824|
|1998| 1569|
|1997| 1537|
|1996| 1360|
|1995| 1298|
|1994| 1205|
|1993| 1086|
|2010|  961|
|1992|  955|
+----+-----+
only showing top 20 rows

In [72]:
spark_df.groupBy('key').count().orderBy('key').show(20)
+---+-----+
|key|count|
+---+-----+
|  0|12408|
|  1| 8415|
|  2|11195|
|  3| 3053|
|  4| 8153|
|  5| 7198|
|  6| 5762|
|  7|12718|
|  8| 5133|
|  9|11014|
| 10| 6716|
| 11| 8235|
+---+-----+

In [73]:
keys_df = spark_df.groupBy('key').count().orderBy('count', ascending=False)
keys_arr = [int(row['key']) for row in keys_df.collect()]
keys_count = [int(row['count']) for row in keys_df.collect()]

plt.bar(keys_arr, keys_count)
plt.xticks(list(range(12)));
In [74]:
spark_df.groupBy('time_signature').count().orderBy('time_signature').show()
+--------------+-----+
|time_signature|count|
+--------------+-----+
|             0|   58|
|             1|13848|
|             3|12328|
|             4|65570|
|             5| 5684|
|             7| 2512|
+--------------+-----+

In [75]:
time_signature_df = spark_df.groupBy('time_signature').count().orderBy('count', ascending=False)
keys_arr = [int(row['time_signature']) for row in time_signature_df.collect()]
keys_count = [int(row['count']) for row in time_signature_df.collect()]

plt.bar(keys_arr, keys_count)
plt.xticks(list(range(8)));

Aggregating the features into a single column and grouping by decade

In order for us to perform Jaccard Index Similarity computation, we aggreated the features into an array. Having a list/array type, we can use the function set to find the intersection and union of the 2 songs to be compared.

In [76]:
songs_df = spark_df.select(
    'song_id',
    'title',
    'artist_name',
    'year',
    psf.concat(psf.lit("k_"), psf.col("key")).alias('key'),
    psf.concat(psf.lit("ts_"), psf.col("time_signature")).alias('time_signature'),
    psf.concat(psf.lit("d_"), psf.col("duration_buckets")).alias('duration'),
    psf.concat(psf.lit("l_"), psf.col("loudness_buckets")).alias('loudness'),
    psf.concat(psf.lit("t_"), psf.col("tempo_buckets")).alias('tempo')
)

songs_df.show(5)
+------------------+--------------------+--------------------+----+----+--------------+--------+--------+-----+
|           song_id|               title|         artist_name|year| key|time_signature|duration|loudness|tempo|
+------------------+--------------------+--------------------+----+----+--------------+--------+--------+-----+
|SOBZRQP12A6701F181|I Loved You In Me...|Kirk Whalum (Feat...|2003| k_1|          ts_3|  d_13.0|   l_7.0|t_6.0|
|SOYTQLG12AB0186901|A New Soul Full o...|               Pheek|   0| k_1|          ts_4|  d_13.0|   l_6.0|t_6.0|
|SOGMJCX12AB0187792|Turtles All The W...|    Every Time I Die|2009|k_10|          ts_1|  d_12.0|   l_8.0|t_6.0|
|SOEUDKR12AB0181F03|Tira Ela De Mim (...|     Alexandre Pires|2005| k_7|          ts_1|  d_12.0|   l_7.0|t_5.0|
|SOLSUZT12A6D4F65EA|Are You Receiving...|                 XTC|1978| k_9|          ts_4|  d_12.0|   l_6.0|t_7.0|
+------------------+--------------------+--------------------+----+----+--------------+--------+--------+-----+
only showing top 5 rows

In [77]:
spark_df_assem = songs_df.withColumn("features",psf.array('key', 'time_signature', 'duration', 'loudness', 'tempo'))
spark_df_assem.select(
    'song_id',
    'artist_name',
    'year',
    'features').show(5, truncate=False)
+------------------+---------------------------------------------------+----+----------------------------------+
|song_id           |artist_name                                        |year|features                          |
+------------------+---------------------------------------------------+----+----------------------------------+
|SOBZRQP12A6701F181|Kirk Whalum (Featuring Isaac Hayes and Wendy Moten)|2003|[k_1, ts_3, d_13.0, l_7.0, t_6.0] |
|SOYTQLG12AB0186901|Pheek                                              |0   |[k_1, ts_4, d_13.0, l_6.0, t_6.0] |
|SOGMJCX12AB0187792|Every Time I Die                                   |2009|[k_10, ts_1, d_12.0, l_8.0, t_6.0]|
|SOEUDKR12AB0181F03|Alexandre Pires                                    |2005|[k_7, ts_1, d_12.0, l_7.0, t_5.0] |
|SOLSUZT12A6D4F65EA|XTC                                                |1978|[k_9, ts_4, d_12.0, l_6.0, t_7.0] |
+------------------+---------------------------------------------------+----+----------------------------------+
only showing top 5 rows

In [78]:
spark_df_assem.createOrReplaceTempView('spark_df')

Finding test data

For our testing, we tried to find songs from different decades. Later, we will find similar songs for each of the decades from 2k1 to 20s.

In [79]:
dec_2k10 = session.sql('SELECT * FROM spark_df WHERE year LIKE "201%"')
dec_2k10.select(
    'song_id',
    'artist_name',
    'year',
    'features').show(10, truncate=False)
+------------------+-------------------------+----+----------------------------------+
|song_id           |artist_name              |year|features                          |
+------------------+-------------------------+----+----------------------------------+
|SOMOZIH12AB018B522|The Magnetic Fields      |2010|[k_2, ts_3, d_12.0, l_7.0, t_6.0] |
|SOLEMFK12AC4688101|Lonely Drifter Karen     |2010|[k_1, ts_4, d_13.0, l_6.0, t_5.0] |
|SOVAICL12AB018C394|Los Autenticos Decadentes|2010|[k_4, ts_1, d_12.0, l_7.0, t_6.0] |
|SOQUYVA12AB0188078|Turin Brakes             |2010|[k_4, ts_3, d_12.0, l_6.0, t_6.0] |
|SOEFTYL12AB018A37E|Golden Triangle          |2010|[k_4, ts_4, d_11.0, l_8.0, t_5.0] |
|SOOYVRF12AB0190637|Chipmunk                 |2010|[k_4, ts_4, d_12.0, l_8.0, t_7.0] |
|SOYJWGG12AB018A2E8|Kate Nash                |2010|[k_7, ts_1, d_12.0, l_7.0, t_5.0] |
|SOVTNBY12AC4689BFE|Katell Keineg            |2010|[k_6, ts_1, d_12.0, l_7.0, t_6.0] |
|SOXUSXR12AB0185F3C|Cold War Kids            |2010|[k_4, ts_4, d_13.0, l_7.0, t_4.0] |
|SONXOEN12AB018D237|This Is Head             |2010|[k_11, ts_4, d_13.0, l_7.0, t_5.0]|
+------------------+-------------------------+----+----------------------------------+
only showing top 10 rows

In [80]:
dec_2k10.select(
    'song_id',
    'artist_name',
    'year',
    'features').orderBy('year', ascending=False).show(10, truncate=False)
+------------------+--------------------------------+----+----------------------------------+
|song_id           |artist_name                     |year|features                          |
+------------------+--------------------------------+----+----------------------------------+
|SODRNYV12AB018AA9B|The Joy Formidable              |2011|[k_2, ts_4, d_12.0, l_8.0, t_6.0] |
|SOBMAPO12AC9072D84|Fehlfarben                      |2010|[k_9, ts_4, d_12.0, l_7.0, t_5.0] |
|SOWRWLC12AB018CC8D|Silver Columns                  |2010|[k_1, ts_4, d_12.0, l_7.0, t_6.0] |
|SOXSRKA12AC3DF9619|The Sight Below                 |2010|[k_4, ts_1, d_13.0, l_5.0, t_6.0] |
|SOAXUMR12AB01859FD|Dark Tranquillity               |2010|[k_9, ts_4, d_14.0, l_7.0, t_5.0] |
|SOEVKSL12AB018BB6B|Alizée                          |2010|[k_6, ts_4, d_12.0, l_7.0, t_6.0] |
|SOYORPU12AB0188918|Ali Farka Touré_ Toumani Diabaté|2010|[k_11, ts_1, d_13.0, l_6.0, t_6.0]|
|SOBKXTM12AB0186F1F|Inspectah Deck                  |2010|[k_6, ts_4, d_11.0, l_7.0, t_6.0] |
|SOSGLFA12AB018FCDC|Tachenko                        |2010|[k_7, ts_4, d_12.0, l_7.0, t_6.0] |
|SOSNKHA12AB0185BD6|Write This Down                 |2010|[k_8, ts_4, d_12.0, l_8.0, t_8.0] |
+------------------+--------------------------------+----+----------------------------------+
only showing top 10 rows

In [81]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Spice Girls%'")
dec_test.select(
    'song_id',
    'title',
    'artist_name',
    'year').orderBy('title').show(truncate=False)
+------------------+-----------------------------------------+-----------+----+
|song_id           |title                                    |artist_name|year|
+------------------+-----------------------------------------+-----------+----+
|SODGHTG12B0B80CB0D|Denying                                  |Spice Girls|1997|
|SOKJTLQ12A8C13D151|Get Down With Me                         |Spice Girls|2000|
|SOELYXY12A8C13D742|Spice Up Your Life (Murk Cuba Libre Mix) |Spice Girls|1997|
|SOFTRQB12A81C20447|Spice Up Your Life (Ralphi's Radio Edit) |Spice Girls|0   |
|SOFBZYB12A8C13C29A|Stop (Stretch 'N' Vern's Rock & Roll Mix)|Spice Girls|1998|
|SOFBZYB12A8C13C29A|Stop (Stretch 'N' Vern's Rock & Roll Mix)|Spice Girls|1998|
|SOMTFQC12AF72A745D|Walk Of Life                             |Spice Girls|1997|
|SOGEDPI12A58A79FC1|Wannabe (Instrumental)                   |Spice Girls|1996|
+------------------+-----------------------------------------+-----------+----+

In [82]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Michael Jackson%'")
dec_test.select(
    'song_id',
    'title',
    'year').orderBy('title').show(truncate=False)
+------------------+----------------------------------------------+----+
|song_id           |title                                         |year|
+------------------+----------------------------------------------+----+
|SOMBOUW12A8C13F2E6|Another Part Of Me                            |1987|
|SOKDRCM12AF72A36EC|Bad                                           |1987|
|SOLISQK12A8C1416AF|Billie Jean                                   |1982|
|SOHZIPR12A8C1350D3|Butterflies                                   |2001|
|SODKXXA12AF72A5A61|Call On Me                                    |1984|
|SODZCSX12A58A78EDF|Children Of The Light                         |1972|
|SOVHVHR12AF72A9BFE|Don't Let It Get You Down                     |1984|
|SOOVGGD12A6D4F80F7|Farewell My Summer Love                       |1984|
|SOULIVR12CF5F8743B|Get On The Floor                              |1979|
|SOAKCNU12A8C135445|Gone Too Soon                                 |1991|
|SOCJFAB12CF54651F5|Got To Be There                               |1972|
|SOTLHDK12AB018A405|Happy (Love Theme From "Lady Sings The Blues")|1992|
|SOBKWJW12A6701D9C6|Happy (Love Theme From "Lady Sings The Blues")|1992|
|SOAMOLF12A8C143D8E|Heal The World                                |1991|
|SOCRWPI12CF5F86CF9|I Can't Help It                               |1979|
|SOKIOOC12AF729ED9E|In The Closet                                 |1991|
|SOIPYQO12A6D4F82DE|Maria (You Were The Only One)                 |1972|
|SOVLYRC12A6D4F80E9|Melodie                                       |1984|
|SOPINQU12A58A7C781|Melodie                                       |1984|
|SORKJDV12A8C132F46|Morphine                                      |1997|
+------------------+----------------------------------------------+----+
only showing top 20 rows

In [83]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Sara Bareilles%'")
dec_test.select(
    'song_id',
    'title',
    'year').orderBy('title').show(truncate=False)
+------------------+-----------------+----+
|song_id           |title            |year|
+------------------+-----------------+----+
|SOMUZEP12AB017E624|Fairytale        |2004|
|SONZPPA12AF72A9E13|Gravity          |2004|
|SOBTTJM12A8AE47413|Love On The Rocks|2004|
+------------------+-----------------+----+

In [84]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Elvis Pres%'")
dec_test.select(
    'song_id',
    'title',
    'year').orderBy('title').show(truncate=False)
+------------------+--------------------------------------------------------------------------------------------+----+
|song_id           |title                                                                                       |year|
+------------------+--------------------------------------------------------------------------------------------+----+
|SOCVZQQ12A8C137920|(You're The) Devil In Disguise                                                              |1963|
|SOCVZQQ12A8C137920|(You're The) Devil In Disguise                                                              |1963|
|SORQFRB12A58A7860F|Blue Moon                                                                                   |1956|
|SOBLILW12A8C143D33|Can't Help Falling In Love                                                                  |1961|
|SOKWEUK12A8C136932|Can't Help Falling In Love                                                                  |1961|
|SOLDNGD12D021932DB|I Dont Care If The Sun Dont Shine                                                         |0   |
|SODHQNP12A58A7E18A|I Need Your Love Tonight (Master Recordings - RCA Studio B_ Nashvilla Tennessee - June 1958)|0   |
|SOLBVZH12AB017D26B|I'll Never Let You Go                                                                       |1987|
|SOFRVBM12AB018D38F|I'll Never Let You Go - Little Darling                                                      |0   |
|SOAMYSH12A8C138265|If We Never Meet Again                                                                      |1960|
|SOFMEOF12A8C13B666|Let Yourself Go                                                                             |1995|
|SODXWIV12D02198D1C|Moody Blue                                                                                  |1976|
|SODKBQX12A8C13A6EA|My Way                                                                                      |1989|
|SOBJTXG12D02198DC2|That's All Right                                                                            |1954|
|SOENWXB12A8C137754|That's What You Get For Lovin' Me                                                           |0   |
|SOFPDWE12AF72A18F1|That's When Your Heartaches Begin                                                           |1957|
|SOLXYIF12A8C1399F1|Today_ Tomorrow And Forever                                                                 |0   |
|SOXUMLF12A8C138FEA|Too Much                                                                                    |1958|
|SOASBRF12A8C1391F3|You Asked Me To                                                                             |0   |
+------------------+--------------------------------------------------------------------------------------------+----+

In [85]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Doris Day%'")
dec_test.select(
    'song_id',
    'title',
    'year').orderBy('title').show(truncate=False)
+------------------+------------------------------------------------------------------------------+----+
|song_id           |title                                                                         |year|
+------------------+------------------------------------------------------------------------------+----+
|SOPQSVF12AB018B387|Autumn Leaves                                                                 |2002|
|SOMNXDK12AB017F6C1|Deadwood Stage (Calamity Jane)                                                |0   |
|SOOXHHK12AC468693A|Hello_ My Lover_ Goodbye                                                      |0   |
|SOOXHHK12AC468693A|Hello_ My Lover_ Goodbye                                                      |0   |
|SODUCON12A81C21FB0|It's A Great Feeling (Previously Unreleased from 'It's A Great Feeling'_ 1949)|0   |
|SODMRHY12AAF3B39D9|Lets Walk That-A-Way                                                          |0   |
|SORXLWS12AB01866F3|Que Sera_ Sera (Whatever Will Be_ Will Be)                                    |0   |
|SOQXXTL12AB017DFFA|Ready Willing and Able                                                        |1993|
|SONJKQR12AB0183507|Ten Thousand Four Hundred Thirty-Two Sheep                                    |0   |
|SOMBMCA12AB017F1EA|The Pajama Game                                                               |0   |
|SOQDAFT12AC3DF6B3A|Tis Harry I'm Plannin' To Marry                                               |0   |
|SOQDAFT12AC3DF6B3A|Tis Harry I'm Plannin' To Marry                                               |0   |
|SOFRNVE12A58A80305|Whatever Will Be Will Be (Que Sera Sera)                                      |2005|
|SOYRMLQ12AC4687F70|When Tonight Is Just A Memory                                                 |0   |
|SOFYBDH12A8C13E7C0|Why Should We Both Be Lonely                                                  |0   |
+------------------+------------------------------------------------------------------------------+----+

In [86]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Abba%'")
dec_test.select(
    'song_id',
    'title',
    'year').orderBy('title').show(truncate=False)
+------------------+--------------------------------------------------------------------------------+----+
|song_id           |title                                                                           |year|
+------------------+--------------------------------------------------------------------------------+----+
|SONTRUM12A8C132778|1812 Overture_ Op.49 (Finale)                                                   |0   |
|SOGWZPI12A58A75FE3|Chiquitita                                                                      |1978|
|SOPCPOL12CF5F87CCC|Dancing Queen                                                                   |1976|
|SONEGMZ12A8C140858|Eagle                                                                           |1988|
|SOGPKOL12A8C132FCD|I'm A Marionette                                                                |1977|
|SOKCDBQ12AB0183568|If It Wasn't For The Nights                                                     |0   |
|SOKCDBQ12AB0183568|If It Wasn't For The Nights                                                     |0   |
|SOWBKSX12A6701FC8D|King Kong Song                                                                  |1974|
|SORMLJU12A8C13EEEE|Mamma Mia                                                                       |1996|
|SORMLJU12A8C13EEEE|Mamma Mia                                                                       |1996|
|SOBFQUM12A8C13663E|Mass in C minor_ K. 427 (417a)/III. Credo: Credo in unum Deum (Allegro maestoso)|0   |
|SOKSBUU12A8C1345BF|Phew! I feel terrible! Let me catch my breath"   (Anatoly Kotcherga)            |0   |
|SOKEZTU12A58A780A1|Scene 2: Introduction                                                           |0   |
|SORAFBF12A67AE13F0|Take A Chance On Me                                                             |1977|
|SOBDATJ12A58A7B2B6|Take A Chance On Me                                                             |1977|
|SOIHSRQ12CF5F87AFE|That's Me                                                                       |1976|
|SOLFFFD12AF729E0D1|Under Attack                                                                    |1982|
|SOWJSTY12CF5F88002|Under Attack                                                                    |1982|
|SOWGZNH12A8C134292|Well_ then? Let's go and vote_ boyars"   (Boyars)                               |0   |
|SOMMYUP12D021B0EED|What About Livingstone                                                          |1974|
+------------------+--------------------------------------------------------------------------------+----+
only showing top 20 rows

In [87]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%Kelly Clarkson%'")
dec_test.select(
    'song_id',
    'title',
    'year').orderBy('title').show(truncate=False)
+------------------+---------------------------------------+----+
|song_id           |title                                  |year|
+------------------+---------------------------------------+----+
|SOXISBL12A8C137BC2|(You Make Me Feel Like A) Natural Woman|2002|
|SOUWYEZ12D0219189A|Because Of You                         |2004|
|SOJKJGM12A8C13C4FC|Hear Me                                |2004|
|SOABXUX12AB017EC46|Maybe                                  |2007|
|SOXLDOC12A8C144672|My Life Would Suck Without You         |2009|
|SOJMSNS12AF72AA844|Walk Away                              |2004|
|SOOPKCC12A58A7FFFB|Walk Away                              |2004|
+------------------+---------------------------------------+----+

Initial Testing

In [88]:
df_test = session.sql('SELECT DISTINCT * FROM spark_df WHERE year LIKE "199%" and song_id == "SOFBZYB12A8C13C29A"')
# df_test = session.sql('SELECT * FROM spark_df WHERE year LIKE "0" and song_id == "SORKKFF12AB0189420"')
df_test.show()
+------------------+--------------------+-----------+----+---+--------------+--------+--------+-----+--------------------+
|           song_id|               title|artist_name|year|key|time_signature|duration|loudness|tempo|            features|
+------------------+--------------------+-----------+----+---+--------------+--------+--------+-----+--------------------+
|SOFBZYB12A8C13C29A|Stop (Stretch 'N'...|Spice Girls|1998|k_1|          ts_4|  d_15.0|   l_7.0|t_6.0|[k_1, ts_4, d_15....|
+------------------+--------------------+-----------+----+---+--------------+--------+--------+-----+--------------------+

In [89]:
dec_2k10.createOrReplaceTempView('Table1')
df_test.createOrReplaceTempView('Table2')
In [90]:
df_crossjoin = session.sql('SELECT distinct_T1.title as title1, distinct_T1.artist_name as artist_name1\
                            , distinct_T1.features as features1, \
                          distinct_T2.title, distinct_T2.artist_name, distinct_T2.features as features2 \
                          FROM Table1 AS distinct_T1 CROSS JOIN Table2 AS distinct_T2')
df_crossjoin.show(5)
+-------------------+--------------------+--------------------+--------------------+-----------+--------------------+
|             title1|        artist_name1|           features1|               title|artist_name|           features2|
+-------------------+--------------------+--------------------+--------------------+-----------+--------------------+
|Always Already Gone| The Magnetic Fields|[k_2, ts_3, d_12....|Stop (Stretch 'N'...|Spice Girls|[k_1, ts_4, d_15....|
|     Wonderous Ways|Lonely Drifter Karen|[k_1, ts_4, d_13....|Stop (Stretch 'N'...|Spice Girls|[k_1, ts_4, d_15....|
|             Jopito|Los Autenticos De...|[k_4, ts_1, d_12....|Stop (Stretch 'N'...|Spice Girls|[k_1, ts_4, d_15....|
|        Paper Heart|        Turin Brakes|[k_4, ts_3, d_12....|Stop (Stretch 'N'...|Spice Girls|[k_1, ts_4, d_15....|
|      Cinco de Mayo|     Golden Triangle|[k_4, ts_4, d_11....|Stop (Stretch 'N'...|Spice Girls|[k_1, ts_4, d_15....|
+-------------------+--------------------+--------------------+--------------------+-----------+--------------------+
only showing top 5 rows

In [91]:
from pyspark.sql.types import DoubleType

jaccard_udf = psf.udf(lambda x, y: float(len(set(x).intersection(set(y))) / float(len(set(x)) + len(set(y)) - len(set(x).intersection(set(y))))), DoubleType())
In [92]:
df_result = df_crossjoin.withColumn(
    'jaccard_similarity', jaccard_udf('features1', 'features2'))
In [93]:
df_result.select('title1', 'artist_name1', 'title', 'artist_name',
                 'jaccard_similarity').orderBy('jaccard_similarity', ascending=False).show(20)
+--------------------+-------------------+--------------------+-----------+------------------+
|              title1|       artist_name1|               title|artist_name|jaccard_similarity|
+--------------------+-------------------+--------------------+-----------+------------------+
|          Alley Cats|           Hot Chip|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|           Starlight|        Steve Brian|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|      Time For Samba|         Swen Weber|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|        Strange Love|              Huski|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|               Afire|   We Are The World|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|     Baroque Digital|       Acid Casuals|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|    Container No 905|      Mardi Gras.BB|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|       Yes And Dance|     Silver Columns|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|Fighting Furies (...|          Charmaine|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|   Brothers In Blood|               Keel|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|       No More Blood|              Gaudi|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|    Dark Reflections|   Decoded Feedback|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|               Miami|            Oddisee|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|      Mixed Feelings|       Arisen Flame|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|A Crack in the Sp...|Postmortem Promises|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|       City of Straw|          Sightings|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|           56% Proof|         Belleruche|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|      Broken Hearted|     Pulcher Femina|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|   Hopeless Romantic|    Raheem Devaughn|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
|Pilot In The Sky ...|          Threshold|Stop (Stretch 'N'...|Spice Girls|0.6666666666666666|
+--------------------+-------------------+--------------------+-----------+------------------+
only showing top 20 rows

Find Similar Songs Function

For our recommender system using Jaccard Index Similarity, we created a function that accepts the song_id and decade for which we will look for similar songs. To compute Jaccard Index Similarity, we determine ratio of the size of intersection and union of the 2 sets. The 2 sets must be composed of the features for each song, which includes the key, time signature, duration, tempo and loudness.

$$J(A,B) = \dfrac{|A \cap B|}{|A \cup B|} = \dfrac{|A \cap B|}{|A| + |B| - |A \cap B|} $$

If the 2 songs are exactly similar based on the given features, they will have a value of 1.0. In general, the Jaccard Index Similarity must have values between 0 and 1.0, inclusive.

In [94]:
dec_test = session.sql("SELECT * FROM spark_df WHERE artist_name like '%ARTIST%'")
dec_test.select(
    'song_id',
    'title',
    'artist_name',
    'year').orderBy('title').show(truncate=False)


#MILLIE : Run this cell
# dec_test = session.sql("SELECT * FROM spark_df WHERE title like '%TITLE%'")
# dec_test.select(
#     'song_id',
#     'title',
#     'artist_name',
#     'year').orderBy('title').show(truncate=False)
+-------+-----+-----------+----+
|song_id|title|artist_name|year|
+-------+-----+-----------+----+
+-------+-----+-----------+----+

In [95]:
from pyspark.sql.types import DoubleType

jaccard_udf = psf.udf(lambda x, y: float(len(set(x).intersection(set(y))) / float(len(set(x)) + len(set(y)) - len(set(x).intersection(set(y))))), DoubleType())

def get_decade_table(decade):
    dec_table = None

    if decade == 'dec_2k1s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "201%"')
    elif decade == 'dec_2ks':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "200%"')
    elif decade == 'dec_90s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "199%"')
    elif decade == 'dec_80s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "198%"')
    elif decade == 'dec_70s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "197%"')
    elif decade == 'dec_60s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "196%"')
    elif decade == 'dec_50s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "195%"')
    elif decade == 'dec_40s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "194%"')
    elif decade == 'dec_30s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "193%"')
    elif decade == 'dec_20s':
        dec_table = session.sql(
            'SELECT * FROM spark_df WHERE year LIKE "192%"')

    return dec_table


def find_similar_songs(song_id, decade, top_n, truncate=False):
    decade_table = get_decade_table(decade)
    decade_table.createOrReplaceTempView('Table1')

    df_song = session.sql(
        "SELECT * FROM spark_df WHERE song_id == '{}'".format(song_id))
    df_song.createOrReplaceTempView('Table2')

    df_crossjoin = session.sql(
        '''SELECT 
        distinct_T1.title as title,
        distinct_T1.artist_name as artist_name,
        distinct_T1.features as features,
        distinct_T1.year as year,
        distinct_T2.title as title2,
        distinct_T2.artist_name as artist_name2,
        distinct_T2.features as features2
        FROM Table1 AS distinct_T1 CROSS JOIN Table2 AS distinct_T2
        ''')

    df_result = df_crossjoin.withColumn(
        'jaccard_similarity', jaccard_udf('features', 'features2'))

    song_choice = df_song.select('title', 'artist_name', 'year').collect()[0]
    print('Song of choice: ', song_choice['title'])
    print('Artist: ', song_choice['artist_name'])
    print('Year: ', song_choice['year'])

    print('\nYour next playlist includes %d songs from decade %s.' %(top_n, decade))
    df_result.select('title', 'artist_name', 'year', 'jaccard_similarity').orderBy(
        'jaccard_similarity', ascending=False).distinct().show(top_n, truncate=truncate)

For the sample songs using the 100k dataset , it can be observed that the song is closely similar to songs from immediate decades. If the number of decades is at least 2, the similarity started to decline. This kind of result is expected since there were some significant differences on tempo and time signature for different decades. It may be possible that in 3 or 4 decades away, we can still find songs very similar if we were able to process the million songs.

For the sample song Stop by Spice Girls, it can be observed that it is most closely similar to songs in the 2000s and 1980s. The similarity started to decline from 1970s to 1920s.

Stop by Spice Girls

In [96]:
songchoice_id = 'SOKNLUS12AB0186A2C' # Stop  by Spice Girls

find_similar_songs(songchoice_id, 'dec_90s', 20)
Song of choice:  Toxic
Artist:  Britney Spears
Year:  2003

Your next playlist includes 20 songs from decade dec_90s.
+---------------------------------+----------------------------+----+------------------+
|title                            |artist_name                 |year|jaccard_similarity|
+---------------------------------+----------------------------+----+------------------+
|A Ra                             |Leo Gandelman               |1999|1.0               |
|Souffles H (King Street Club Mix)|Mondo Grosso                |1996|1.0               |
|Fourteen Wives                   |Screamin' Jay Hawkins       |1995|1.0               |
|St. John Of Gods                 |Shane MacGowan And The Popes|1997|1.0               |
|Kidney Stew                      |Pinetop Perkins             |1998|1.0               |
|Cut For Life                     |Leftfield                   |1995|1.0               |
|Legacy                           |The Space Brothers          |1999|1.0               |
|Disco Eterno                     |Soda Stereo                 |1995|1.0               |
|Somewhere In Time                |IQ                          |1997|1.0               |
|Incantation                      |Delerium                    |1994|1.0               |
|Superstring                      |Cygnus X                    |1995|1.0               |
|Pokinoï                          |Cirque du Soleil            |1992|1.0               |
|What the Hell Went Wrong         |The Four Horsemen           |1996|1.0               |
|2 Times                          |Ann Lee                     |1999|1.0               |
|Red Alert                        |Basement Jaxx               |1999|1.0               |
|From A Window                    |Northern Uproar             |1996|0.6666666666666666|
|Plastic Dreams                   |Jaydee                      |1992|0.6666666666666666|
|Lightning Breaks                 |Plasmatics                  |1996|0.6666666666666666|
|Skydiving                        |The Bongos                  |1992|0.6666666666666666|
|Excellent                        |Shampoo                     |1993|0.6666666666666666|
+---------------------------------+----------------------------+----+------------------+
only showing top 20 rows

In [240]:
songchoice_id = 'SOFBZYB12A8C13C29A' # Stop  by Spice Girls

find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 20 songs from decade dec_2k1s.
+-----------------------------------------+----------------+----+------------------+
|title                                    |artist_name     |year|jaccard_similarity|
+-----------------------------------------+----------------+----+------------------+
|No More Blood                            |Gaudi           |2010|0.6666666666666666|
|56% Proof                                |Belleruche      |2010|0.6666666666666666|
|Miami                                    |Oddisee         |2010|0.6666666666666666|
|Broken Hearted                           |Pulcher Femina  |2010|0.6666666666666666|
|Pilot In The Sky Of Dreams (Instrumental)|Threshold       |2010|0.6666666666666666|
|Dark Reflections                         |Decoded Feedback|2010|0.6666666666666666|
|Mixed Feelings                           |Arisen Flame    |2010|0.6666666666666666|
|City of Straw                            |Sightings       |2010|0.6666666666666666|
|Yes And Dance                            |Silver Columns  |2010|0.6666666666666666|
|Afire                                    |We Are The World|2010|0.6666666666666666|
|Fighting Furies (Album Version)          |Charmaine       |2010|0.6666666666666666|
|Baroque Digital                          |Acid Casuals    |2010|0.6666666666666666|
|Hopeless Romantic                        |Raheem Devaughn |2010|0.6666666666666666|
|Starlight                                |Steve Brian     |2010|0.6666666666666666|
|Strange Love                             |Huski           |2010|0.6666666666666666|
|Brothers In Blood                        |Keel            |2010|0.6666666666666666|
|House Cleaning                           |Mavado          |2010|0.6666666666666666|
|Unfit To Live                            |Living Sacrifice|2010|0.6666666666666666|
|All In                                   |Lifehouse       |2010|0.6666666666666666|
|Ørkenvandring                            |Prins Thomas    |2010|0.6666666666666666|
+-----------------------------------------+----------------+----+------------------+
only showing top 20 rows

In [241]:
find_similar_songs(songchoice_id, 'dec_2ks', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_2ks.
+--------------------------------+---------------------------------+----+------------------+
|title                           |artist_name                      |year|jaccard_similarity|
+--------------------------------+---------------------------------+----+------------------+
|Tracking Treasure Down          |Gabriel & Dresden                |2006|1.0               |
|Reverence                       |Hemstock & Jennings vs Adam White|2004|1.0               |
|Silver (Shiloh's Futureprog Rmx)|Tripswitch                       |2007|1.0               |
|We Are One                      |Kelly Sweet                      |2007|1.0               |
|Stalker                         |Probspot                         |2008|1.0               |
|Summer Calling                  |Andain                           |2002|1.0               |
|Bucci Bag                       |Andrea Doria                     |2002|1.0               |
|Remind (1995)                   |Orbital                          |2007|1.0               |
|Cerealogy                       |Crop Circles                     |2008|1.0               |
|Death Is This Communion         |High On Fire                     |2007|1.0               |
+--------------------------------+---------------------------------+----+------------------+
only showing top 10 rows

In [213]:
find_similar_songs(songchoice_id, 'dec_90s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_90s.
+-----------------------------------------+----------------------------+----+------------------+
|title                                    |artist_name                 |year|jaccard_similarity|
+-----------------------------------------+----------------------------+----+------------------+
|Christmas With Satan                     |James White And The Blacks  |1995|1.0               |
|Science Of The Gods                      |Eat Static                  |1997|1.0               |
|Stop (Stretch 'N' Vern's Rock & Roll Mix)|Spice Girls                 |1998|1.0               |
|alala                                    |California Sunshine         |1997|1.0               |
|Revolution 909 (Roger Sanchez Remix)     |Daft Punk                   |1998|1.0               |
|Flying Saucer Landing                    |Ubar Tmar                   |1997|1.0               |
|Lord Of The Dance                        |Battle of The Future Buddhas|1998|1.0               |
|Unity                                    |DJ Orkidea                  |1999|1.0               |
|Luck (Album)                             |Supersuckers                |1992|0.6666666666666666|
|Saturday Night Party                     |Alex Party                  |1993|0.6666666666666666|
+-----------------------------------------+----------------------------+----+------------------+
only showing top 10 rows

In [214]:
find_similar_songs(songchoice_id, 'dec_80s', 3)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 3 songs from decade dec_80s.
+---------------+-------------------------+----+------------------+
|title          |artist_name              |year|jaccard_similarity|
+---------------+-------------------------+----+------------------+
|Uptown Festival|Shalamar                 |1989|1.0               |
|Nice           |Liliput                  |1986|0.6666666666666666|
|Over The Points|Ian Dury & The Blockheads|1980|0.6666666666666666|
+---------------+-------------------------+----+------------------+
only showing top 3 rows

In [215]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_70s.
+-------------------------------+----------------+----+------------------+
|title                          |artist_name     |year|jaccard_similarity|
+-------------------------------+----------------+----+------------------+
|Warriors                       |Thin Lizzy      |1976|0.6666666666666666|
|Painter Man                    |Boney M.        |1978|0.6666666666666666|
|Give me love                   |Cerrone         |1978|0.6666666666666666|
|Do not disturb                 |Air             |1971|0.6666666666666666|
|Baby Let Me Kiss You           |King Floyd      |1971|0.6666666666666666|
|People You Can't Trust         |Atomic Rooster  |1972|0.6666666666666666|
|Ketchy Shuby                   |Peter Tosh      |1976|0.6666666666666666|
|Thunderfoot                    |Seals and Crofts|1976|0.6666666666666666|
|No confusion                   |Linval Thompson |1978|0.6666666666666666|
|Back Street Luv (Album Version)|Curved Air      |1971|0.6666666666666666|
+-------------------------------+----------------+----+------------------+
only showing top 10 rows

In [216]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_60s.
+-----------------------------------+----------------------+----+-------------------+
|title                              |artist_name           |year|jaccard_similarity |
+-----------------------------------+----------------------+----+-------------------+
|Jimmy Mack                         |Martha & The Vandellas|1966|0.6666666666666666 |
|S.O.S                              |Edwin Starr           |1969|0.6666666666666666 |
|It's Gonna Work Out Fine           |Ike & Tina Turner     |1966|0.6666666666666666 |
|Morris Park                        |Lenni Sesar           |1969|0.6666666666666666 |
|When I'm Gone                      |Brenda Holloway       |1965|0.6666666666666666 |
|Ghetto Raga (2003 Digital Remaster)|Third Ear Band        |1969|0.42857142857142855|
|Stone Crazy                        |Screamin' Jay Hawkins |1969|0.42857142857142855|
|Hot Summer Day                     |It's A Beautiful Day  |1969|0.42857142857142855|
|Then He Kissed Me                  |The Crystals          |1963|0.42857142857142855|
|Somebody To Love                   |Jefferson Airplane    |1967|0.42857142857142855|
+-----------------------------------+----------------------+----+-------------------+
only showing top 10 rows

In [217]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_50s.
+------------------------------+-------------------+----+-------------------+
|title                         |artist_name        |year|jaccard_similarity |
+------------------------------+-------------------+----+-------------------+
|La Bamba                      |Ritchie Valens     |1958|0.6666666666666666 |
|At The Hop                    |Danny & The Juniors|1957|0.42857142857142855|
|Almost Grown                  |Chuck Berry        |1959|0.42857142857142855|
|How Are Ya' Fixed For Love?   |Frank Sinatra      |1958|0.42857142857142855|
|Way Down Yonder In New Orleans|Freddy Cannon      |1959|0.42857142857142855|
|Old Maid                      |Big Bopper         |1959|0.42857142857142855|
|Sonny Boy                     |Toots Thielemans   |1955|0.42857142857142855|
|Everyday I Have The Blues     |B.B. King          |1956|0.42857142857142855|
|Problems                      |The Everly Brothers|1958|0.42857142857142855|
|Susie Q                       |Dale Hawkins       |1957|0.42857142857142855|
+------------------------------+-------------------+----+-------------------+
only showing top 10 rows

In [218]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_40s.
+----------------------------------+------------------------+----+-------------------+
|title                             |artist_name             |year|jaccard_similarity |
+----------------------------------+------------------------+----+-------------------+
|Tennessee Saturday Night          |Red Foley               |1948|0.42857142857142855|
|Old Maid Boogie                   |Eddie "Cleanhead" Vinson|1947|0.42857142857142855|
|Aberdeen Mississippi Blues        |Bukka White             |1940|0.25               |
|Day By Day                        |Al Stafford             |1945|0.25               |
|I Still Get a Thrill              |Harry Belafonte         |1949|0.25               |
|Strange Things Happening Every Day|Sister Rosetta Tharpe   |1945|0.25               |
|Whoopin' the Blues                |Sonny Terry             |1945|0.25               |
|Old Maid Boogie                   |Eddie Vinson            |1947|0.25               |
|Good Morning Heartache            |Billie Holiday          |1946|0.25               |
|Mind Your Own Business            |Hank Williams           |1949|0.25               |
+----------------------------------+------------------------+----+-------------------+
only showing top 10 rows

In [219]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_30s.
+--------------------------------+---------------------------------------+----+------------------+
|title                           |artist_name                            |year|jaccard_similarity|
+--------------------------------+---------------------------------------+----+------------------+
|Don't You Leave Me Here         |Jelly Roll Morton's New Orleans Jazzmen|1939|0.25              |
|Police Station Blues            |Peetie Wheatstraw                      |1932|0.25              |
|Tuxedo Junction                 |Erskine Hawkins & His Orchestra        |1939|0.25              |
|Cocaine Habit Blues             |Memphis Jug Band                       |1930|0.1111111111111111|
|Tuxedo Junction                 |Erskine Hawkins and His Orchestra      |1939|0.1111111111111111|
|Blue Lou                        |Fletcher Henderson And His Orchestra   |1936|0.1111111111111111|
|Sitting On Top Of The World     |Mississippi Sheiks                     |1930|0.1111111111111111|
|Jeepers creepers                |Louis Armstrong                        |1939|0.1111111111111111|
|Turpentine Blues                |Tampa Red                              |1932|0.1111111111111111|
|I've Got My Love To Keep Me Warm|Billie Holiday                         |1937|0.1111111111111111|
+--------------------------------+---------------------------------------+----+------------------+
only showing top 10 rows

In [242]:
find_similar_songs(songchoice_id, 'dec_20s', 10)
Song of choice:  Stop (Stretch 'N' Vern's Rock & Roll Mix)
Artist:  Spice Girls
Year:  1998

Your next playlist includes 10 songs from decade dec_20s.
+-----------------------------------------+----------------------+----+-------------------+
|title                                    |artist_name           |year|jaccard_similarity |
+-----------------------------------------+----------------------+----+-------------------+
|Broke And Hungry                         |Blind Lemon Jefferson |1927|0.42857142857142855|
|The Prisoner's Song                      |Vernon Dalhart        |1924|0.25               |
|Bedtime Blues                            |Frank Stokes          |1928|0.25               |
|He's Got Me Goin'                        |Bessie Smith          |1929|0.25               |
|Goin' Places                             |Joe Venuti_ Eddie Lang|1927|0.1111111111111111 |
|Nobody Knows You When You're Down And Out|Bessie Smith          |1929|0.1111111111111111 |
|Ain't misbehavin'                        |Fats Waller           |1929|0.1111111111111111 |
|Down The Dirt Road Blues                 |Charley Patton        |1929|0.1111111111111111 |
|Just Because                             |Nelstone´s Hawaiians  |1929|0.1111111111111111 |
|Corn Liquor Blues                        |Papa Charlie Jackson  |1929|0.1111111111111111 |
+-----------------------------------------+----------------------+----+-------------------+
only showing top 10 rows

Mamma Mia by ABBA

In [243]:
songchoice_id = 'SORMLJU12A8C13EEEE'
find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 20 songs from decade dec_2k1s.
+-----------------------------------------+------------------------------------+----+------------------+
|title                                    |artist_name                         |year|jaccard_similarity|
+-----------------------------------------+------------------------------------+----+------------------+
|Stick To My Side (Four Tet version)      |Pantha Du Prince                    |2010|1.0               |
|Feed The Freezer                         |Nic Fanciulli                       |2010|1.0               |
|Atomic Chapel                            |1349                                |2010|0.6666666666666666|
|In The Heart Of Her Own Magic Field      |Kaipa                               |2010|0.6666666666666666|
|Sunday Disco Romance                     |Mathew Jonson                       |2010|0.6666666666666666|
|Driven                                   |Ben Nicky                           |2010|0.6666666666666666|
|Pilot In The Sky Of Dreams (Instrumental)|Threshold                           |2010|0.6666666666666666|
|Skeptical                                |God Module                          |2010|0.6666666666666666|
|Who We Are                               |Luigi Lusini                        |2010|0.6666666666666666|
|Trouble Comes Running                    |Spoon                               |2010|0.6666666666666666|
|Higienopolis                             |Christian Smith                     |2010|0.6666666666666666|
|Keep It Goin' Louder                     |Major Lazer / Nina Sky / Ricky Blaze|2010|0.6666666666666666|
|Tírame a un volcán                       |Tachenko                            |2010|0.6666666666666666|
|Night Winds                              |Geographer                          |2010|0.6666666666666666|
|Devil In You                             |The Watson Twins                    |2010|0.6666666666666666|
|Summer Skin                              |JPL                                 |2010|0.6666666666666666|
|Ridin' In My Car                         |She & Him                           |2010|0.6666666666666666|
|A Nomads Retreat                        |Pantha Du Prince                    |2010|0.6666666666666666|
|Starlight                                |Steve Brian                         |2010|0.6666666666666666|
|Chance                                   |Cobblestone Jazz                    |2010|0.6666666666666666|
+-----------------------------------------+------------------------------------+----+------------------+
only showing top 20 rows

In [244]:
find_similar_songs(songchoice_id, 'dec_2ks', 10, True)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_2ks.
+--------------------+--------------------+----+------------------+
|               title|         artist_name|year|jaccard_similarity|
+--------------------+--------------------+----+------------------+
|     Touched the sky|Dennis Ferrer fea...|2007|               1.0|
|       Every Morning|    Andreas Kauffelt|2006|               1.0|
|   Freed From Desire|    Backside Artists|2008|               1.0|
|       Mother Mature|            Kino Oko|2008|               1.0|
|Make You Mine (Fr...|        Miami Horror|2009|               1.0|
|The great leap fo...|        Red Sparowes|2006|               1.0|
|Language Of The F...|           1200 Mics|2003|               1.0|
|Die Wahrheit (Alb...|               Kante|2006|               1.0|
|We Don't Care (Mu...|        Audio Bullys|2003|               1.0|
|Over And Over (So...|            Hot Chip|2006|               1.0|
+--------------------+--------------------+----+------------------+
only showing top 10 rows

In [223]:
find_similar_songs(songchoice_id, 'dec_90s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 20 songs from decade dec_90s.
+---------------------------------------+-------------------+----+------------------+
|title                                  |artist_name        |year|jaccard_similarity|
+---------------------------------------+-------------------+----+------------------+
|Mamma Mia                              |Abbacadabra        |1996|1.0               |
|Def=Lim                                |Montauk P          |1998|1.0               |
|Fields Of Haar-Meggido                 |Behemoth           |1993|1.0               |
|This Love                              |Stavesacre         |1999|1.0               |
|To Enter Your Mountain                 |bathory            |1991|1.0               |
|Boss Of Nova (LP Version)              |Gerald Albright    |1991|1.0               |
|Pronoun                                |Bird of Ill Omen   |1997|1.0               |
|Troppe Emozioni                        |Bluvertigo         |1997|1.0               |
|Eternal Return                         |Ubar Tmar          |1997|1.0               |
|Been Around                            |Adeva              |1999|1.0               |
|Tour De France (Kling Klang Analog Mix)|Kraftwerk          |1999|1.0               |
|I've Made Enough Friends               |The Wrens          |1996|0.6666666666666666|
|A Ra                                   |Leo Gandelman      |1999|0.6666666666666666|
|Over Your Shoulder (LP Version)        |Seven Mary Three   |1998|0.6666666666666666|
|Si Pudiera                             |Los Suaves         |1994|0.6666666666666666|
|The Rains Came                         |Sir Douglas Quintet|1990|0.6666666666666666|
|Crowded In The Wings                   |The Jayhawks       |1992|0.6666666666666666|
|Girl Like That  (LP Version)           |matchbox twenty    |1996|0.6666666666666666|
|Guardian Angel                         |Juno Reactor       |1995|0.6666666666666666|
|Travel Agent                           |Fu Manchu          |1995|0.6666666666666666|
+---------------------------------------+-------------------+----+------------------+
only showing top 20 rows

In [224]:
find_similar_songs(songchoice_id, 'dec_80s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_80s.
+----------------------------------------+----------------+----+------------------+
|title                                   |artist_name     |year|jaccard_similarity|
+----------------------------------------+----------------+----+------------------+
|On & On                                 |Jesse Saunders  |1989|1.0               |
|Breaking All The Rules                  |Peter Frampton  |1981|1.0               |
|Bedsitter                               |Soft Cell       |1981|1.0               |
|Arrivederci Solo (2004 Digital Remaster)|TC Matic        |1983|0.6666666666666666|
|Burning Down                            |Play Dead       |1985|0.6666666666666666|
|Paid In Full                            |Eric B. & Rakim |1987|0.6666666666666666|
|The Golden Dawn (2002 Digital Remaster) |The Church      |1982|0.6666666666666666|
|Stranger                                |Pallas          |1986|0.6666666666666666|
|Princess Of The Dawn                    |Accept          |1982|0.6666666666666666|
|Hipocritas                              |La Polla Records|1987|0.6666666666666666|
+----------------------------------------+----------------+----+------------------+
only showing top 10 rows

In [225]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_70s.
+----------------------+------------------------+----+------------------+
|title                 |artist_name             |year|jaccard_similarity|
+----------------------+------------------------+----+------------------+
|The One And Only      |Gladys Knight & The Pips|1978|0.6666666666666666|
|Hey Joe               |Jimi Hendrix            |1970|0.6666666666666666|
|Death Walks Behind You|Atomic Rooster          |1971|0.6666666666666666|
|Disco Magic           |T-Connection            |1976|0.6666666666666666|
|Road Song             |Pat Martino             |1975|0.6666666666666666|
|Give me love          |Cerrone                 |1978|0.6666666666666666|
|Willow Tree           |Black Uhuru             |1977|0.6666666666666666|
|Rose Coloured Glasses |The Raspberries         |1974|0.6666666666666666|
|Born To Wander        |Rare Earth              |1970|0.6666666666666666|
|Luz De Vela           |O Terco                 |1976|0.6666666666666666|
+----------------------+------------------------+----+------------------+
only showing top 10 rows

In [226]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_60s.
+-----------------------------------+-------------------------+----+-------------------+
|title                              |artist_name              |year|jaccard_similarity |
+-----------------------------------+-------------------------+----+-------------------+
|Turtle Walk                        |Lou Donaldson            |1969|0.6666666666666666 |
|Never Going Back                   |The Lovin' Spoonful      |1968|0.6666666666666666 |
|Repeat After Me                    |The Three Sounds         |1969|0.6666666666666666 |
|Lisbon Antiqua                     |The Three Suns           |1960|0.6666666666666666 |
|Radar                              |Link Wray & The Wraymen  |1960|0.6666666666666666 |
|Soldier's Plea                     |Marvin Gaye              |1963|0.6666666666666666 |
|There She Goes Again               |Velvet Underground & Nico|1967|0.6666666666666666 |
|Pop Giant                          |Ekseption                |1969|0.6666666666666666 |
|Ghetto Raga (2003 Digital Remaster)|Third Ear Band           |1969|0.42857142857142855|
|Grizzly Bear                       |The Youngbloods          |1967|0.42857142857142855|
+-----------------------------------+-------------------------+----+-------------------+
only showing top 10 rows

In [227]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_50s.
+------------------------------+-------------------+----+-------------------+
|title                         |artist_name        |year|jaccard_similarity |
+------------------------------+-------------------+----+-------------------+
|Almost Grown                  |Chuck Berry        |1959|0.6666666666666666 |
|There Is Nothin' Like A Dame  |Percy Faith        |1958|0.42857142857142855|
|Reconsider Baby               |Lowell Fulson      |1954|0.42857142857142855|
|Tallahassee Lassie            |Freddy Cannon      |1959|0.42857142857142855|
|How Are Ya' Fixed For Love?   |Frank Sinatra      |1958|0.42857142857142855|
|Way Down Yonder In New Orleans|Freddy Cannon      |1959|0.42857142857142855|
|Good Golly Miss Molly         |Little Richard     |1958|0.42857142857142855|
|Old Maid                      |Big Bopper         |1959|0.42857142857142855|
|Problems                      |The Everly Brothers|1958|0.42857142857142855|
|Susie Q                       |Dale Hawkins       |1957|0.42857142857142855|
+------------------------------+-------------------+----+-------------------+
only showing top 10 rows

In [228]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_40s.
+----------------------------------+------------------------+----+-------------------+
|title                             |artist_name             |year|jaccard_similarity |
+----------------------------------+------------------------+----+-------------------+
|Old Maid Boogie                   |Eddie "Cleanhead" Vinson|1947|0.42857142857142855|
|Evidence                          |Thelonious Monk         |1947|0.25               |
|T-Bone Shuffle                    |T-Bone Walker           |1949|0.25               |
|Tennessee Saturday Night          |Red Foley               |1948|0.25               |
|That's All Right                  |Arthur "Big Boy" Crudup |1946|0.25               |
|I Still Get a Thrill              |Harry Belafonte         |1949|0.25               |
|Strange Things Happening Every Day|Sister Rosetta Tharpe   |1945|0.25               |
|Whoopin' the Blues                |Sonny Terry             |1945|0.25               |
|Old Maid Boogie                   |Eddie Vinson            |1947|0.25               |
|Mind Your Own Business            |Hank Williams           |1949|0.25               |
+----------------------------------+------------------------+----+-------------------+
only showing top 10 rows

In [229]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_30s.
+--------------------------------+---------------------------------------+----+------------------+
|title                           |artist_name                            |year|jaccard_similarity|
+--------------------------------+---------------------------------------+----+------------------+
|Don't You Leave Me Here         |Jelly Roll Morton's New Orleans Jazzmen|1939|0.25              |
|Old Shep                        |Red Foley                              |1936|0.25              |
|Tuxedo Junction                 |Erskine Hawkins & His Orchestra        |1939|0.25              |
|Cocaine Habit Blues             |Memphis Jug Band                       |1930|0.1111111111111111|
|Tuxedo Junction                 |Erskine Hawkins and His Orchestra      |1939|0.1111111111111111|
|Blue Lou                        |Fletcher Henderson And His Orchestra   |1936|0.1111111111111111|
|Sitting On Top Of The World     |Mississippi Sheiks                     |1930|0.1111111111111111|
|Jeepers creepers                |Louis Armstrong                        |1939|0.1111111111111111|
|Turpentine Blues                |Tampa Red                              |1932|0.1111111111111111|
|I've Got My Love To Keep Me Warm|Billie Holiday                         |1937|0.1111111111111111|
+--------------------------------+---------------------------------------+----+------------------+
only showing top 10 rows

In [245]:
find_similar_songs(songchoice_id, 'dec_20s', 10, True)
Song of choice:  Mamma Mia
Artist:  Abbacadabra
Year:  1996

Your next playlist includes 10 songs from decade dec_20s.
+--------------------+--------------------+----+------------------+
|               title|         artist_name|year|jaccard_similarity|
+--------------------+--------------------+----+------------------+
|    Broke And Hungry|Blind Lemon Jeffe...|1927|              0.25|
| The Prisoner's Song|      Vernon Dalhart|1924|              0.25|
|Judge Harsh Blues...|         Furry Lewis|1928|0.1111111111111111|
|        Goin' Places|Joe Venuti_ Eddie...|1927|0.1111111111111111|
|Nobody Knows You ...|        Bessie Smith|1929|0.1111111111111111|
|   Ain't misbehavin'|         Fats Waller|1929|0.1111111111111111|
|Down The Dirt Roa...|      Charley Patton|1929|0.1111111111111111|
|       Bedtime Blues|        Frank Stokes|1928|0.1111111111111111|
|   He's Got Me Goin'|        Bessie Smith|1929|0.1111111111111111|
|        Just Because|Nelstone´s Hawaiians|1929|0.1111111111111111|
+--------------------+--------------------+----+------------------+
only showing top 10 rows

Billie Jean by Michael Jackson

In [14]:
songchoice_id = 'SOLISQK12A8C1416AF'
find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 20 songs from decade dec_2k1s.
+------------------------------------------+----------------------------------+----+------------------+
|title                                     |artist_name                       |year|jaccard_similarity|
+------------------------------------------+----------------------------------+----+------------------+
|Lover Of Mine                             |Beach House                       |2010|1.0               |
|Nitetime rainbows (The Buddy System remix)|A Sunny Day In Glasgow            |2010|1.0               |
|Chemistry Will Find Me                    |Emma Pollock                      |2010|1.0               |
|0009                                      |This Is Head                      |2010|0.6666666666666666|
|Mind Elevation                            |K. Sparks featuring Cymarshall Law|2010|0.6666666666666666|
|Ubbidirò                                  |Biagio Antonacci                  |2010|0.6666666666666666|
|Jõud                                      |Metsatöll                         |2010|0.6666666666666666|
|Albatross                                 |The Besnard Lakes                 |2010|0.6666666666666666|
|Manon                                     |Christophe Maé                    |2010|0.6666666666666666|
|Getting By_ High_ and Strange             |Kris Kristofferson                |2010|0.6666666666666666|
|Expresso Madureira                        |Incognito                         |2010|0.6666666666666666|
|Reflection                                |Lemongrass feat. Karen Gibson Roc |2010|0.6666666666666666|
|Gegen die Wand                            |Creme Fresh                       |2010|0.6666666666666666|
|Better Days                               |Babylonia                         |2010|0.6666666666666666|
|Violent Dreams                            |Crystal Castles                   |2010|0.6666666666666666|
|Aufgeraucht                               |Fehlfarben                        |2010|0.6666666666666666|
|Iridium                                   |Dark Tranquillity                 |2010|0.6666666666666666|
|Diplomats Son                            |Vampire Weekend                   |2010|0.6666666666666666|
|El Tren                                   |Extremoduro                       |2010|0.6666666666666666|
|The Meaning of Life                       |Vargo                             |2010|0.6666666666666666|
+------------------------------------------+----------------------------------+----+------------------+
only showing top 20 rows

In [15]:
find_similar_songs(songchoice_id, 'dec_2ks', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_2ks.
+--------------------------------+----------------------------+----+------------------+
|title                           |artist_name                 |year|jaccard_similarity|
+--------------------------------+----------------------------+----+------------------+
|Nothing On Earth (Album Version)|Jeff Deyo                   |2007|1.0               |
|Ain't It Hard                   |Sharon Jones & The Dap-Kings|2002|1.0               |
|Life Events and Sinking Ships   |The Umbrella Sequence       |2007|1.0               |
|Peaches                         |A. Skillz & Krafty Kuts     |2004|1.0               |
|Optical Illusions               |William Orbit               |2009|1.0               |
|Honeycomb Tripe                 |To Live & Shave In L.A.     |2002|1.0               |
|Takes A Lot Of Tryin'           |The Tyde                    |2003|1.0               |
|U In The Stars                  |Bell X1                     |2003|1.0               |
|Instinct                        |Stereotypical Working Class |2003|1.0               |
|New Toys                        |The Cooper Temple Clause    |2003|1.0               |
+--------------------------------+----------------------------+----+------------------+
only showing top 10 rows

In [16]:
find_similar_songs(songchoice_id, 'dec_90s', 20)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 20 songs from decade dec_90s.
+---------------------------------------------------------------+---------------------------+----+------------------+
|title                                                          |artist_name                |year|jaccard_similarity|
+---------------------------------------------------------------+---------------------------+----+------------------+
|Apache Rose Peacock (Album Version)                            |Red Hot Chili Peppers      |1991|1.0               |
|Darkness And Light                                             |Yonder Mountain String Band|1999|1.0               |
|Not For You                                                    |Pearl Jam                  |1994|1.0               |
|The Well                                                       |Tarnation                  |1995|1.0               |
|Regular Girl                                                   |Kool Keith                 |1997|1.0               |
|And On And On                                                  |Janet Jackson              |1994|1.0               |
|Rock Me Amadeus (Ihn liebten alle Frauen...) - ( Live Version )|Falco                      |1999|1.0               |
|I'll Make It Right                                             |Usher                      |1994|1.0               |
|Sentence                                                       |Era                        |1999|1.0               |
|Por Amarte                                                     |Los Prisioneros            |1990|1.0               |
|November im Mai                                                |Puhdys                     |1997|1.0               |
|Wake Me When It's Over (Album Version)                         |Candy Dulfer               |1995|1.0               |
|In Yer Face                                                    |808 State                  |1991|1.0               |
|Sinful Bliss                                                   |Swollen Members            |1999|1.0               |
|Only Heaven Knows ( LP Version )                               |Foreigner                  |1991|1.0               |
|Eternal Life                                                   |Jeff Buckley               |1993|1.0               |
|Johnny B. Goode                                                |Los Suaves                 |1995|1.0               |
|Missing                                                        |Everything But The Girl    |1994|1.0               |
|Natural Born Bugie                                             |Humble Pie                 |1990|1.0               |
|Ritual Day                                                     |Velour 100                 |1997|1.0               |
+---------------------------------------------------------------+---------------------------+----+------------------+
only showing top 20 rows

In [17]:
find_similar_songs(songchoice_id, 'dec_80s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_80s.
+--------------------------+----------------------+----+------------------+
|title                     |artist_name           |year|jaccard_similarity|
+--------------------------+----------------------+----+------------------+
|Heaven Is A 4 Letter Word |Bad English           |1989|1.0               |
|Outside World             |Midnight Oil          |1982|1.0               |
|The Crush Of Love         |Joe Satriani          |1988|1.0               |
|Giving In                 |Aretha Franklin       |1983|1.0               |
|Out My Way (Album Version)|Meat Puppets          |1986|1.0               |
|Evil's Rising             |Holy Terror           |1987|1.0               |
|I'm Not The Only One      |Atlanta Rhythm Section|1989|1.0               |
|Ropa Violeta              |Luis Alberto Spinetta |1986|1.0               |
|Que Me Pisen              |SUMO                  |1986|1.0               |
|Night By Night            |Michael Stanley Band  |1982|1.0               |
+--------------------------+----------------------+----+------------------+
only showing top 10 rows

In [18]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_70s.
+----------------------+------------------------------------+----+------------------+
|title                 |artist_name                         |year|jaccard_similarity|
+----------------------+------------------------------------+----+------------------+
|You're Mean           |Chicken Shack / Stan Webb           |1976|1.0               |
|Alice                 |Mott The Hoople                     |1974|1.0               |
|Vengeance (LP Version)|Carly Simon                         |1979|1.0               |
|Flirtin' With Disaster|Molly Hatchet                       |1979|1.0               |
|Eat Starch Mom        |Jefferson Airplane                  |1972|1.0               |
|So You Win Again      |Hot Chocolate                       |1978|1.0               |
|Baby It's You         |Racey                               |1979|1.0               |
|Free Money            |Penetration                         |1979|1.0               |
|The Way Of The Pilgrim|Mahavishnu Orchestra;John McLaughlin|1976|0.6666666666666666|
|Don't Stop Believin'  |Olivia Newton-John                  |1976|0.6666666666666666|
+----------------------+------------------------------------+----+------------------+
only showing top 10 rows

In [19]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_60s.
+---------------------------------------+--------------------+----+------------------+
|title                                  |artist_name         |year|jaccard_similarity|
+---------------------------------------+--------------------+----+------------------+
|Waitin' For The Wind                   |Spooky Tooth        |1969|0.6666666666666666|
|Taboo (LP Version)                     |Booker T. & The MG's|1966|0.6666666666666666|
|Too Weak to Fight                      |Clarence Carter     |1968|0.6666666666666666|
|Humanoid Boogie (2007 Digital Remaster)|The Bonzo Dog Band  |1968|0.6666666666666666|
|Michael And The Slipper Tree           |The Equals          |1968|0.6666666666666666|
|Just One Smile                         |Blood_ Sweat & Tears|1968|0.6666666666666666|
|Rattlesnake Shake                      |Peter Green         |1969|0.6666666666666666|
|Butcher's Tale (Western Front 1914)    |The Zombies         |1968|0.6666666666666666|
|Louise                                 |Howlin' Wolf        |1965|0.6666666666666666|
|A Day In The Life                      |Wes Montgomery      |1967|0.6666666666666666|
+---------------------------------------+--------------------+----+------------------+
only showing top 10 rows

In [20]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_50s.
+---------------------------------+-------------------------------------+----+-------------------+
|title                            |artist_name                          |year|jaccard_similarity |
+---------------------------------+-------------------------------------+----+-------------------+
|Everyday I Have The Blues        |B.B. King                            |1956|0.6666666666666666 |
|When The Saints Go Marching In   |The Isley Brothers                   |1959|0.6666666666666666 |
|He's Gone                        |Chantels                             |1958|0.42857142857142855|
|Honest I Do                      |Jimmy Reed                           |1957|0.42857142857142855|
|At The Hop                       |Danny & The Juniors                  |1957|0.42857142857142855|
|Almost Like Being in Love        |Red Garland                          |1957|0.42857142857142855|
|Low And Lonely                   |Roy Acuff And His Smoky Mountain Boys|1951|0.42857142857142855|
|Reconsider Baby                  |Lowell Fulson                        |1954|0.42857142857142855|
|Ends and Odds - Original         |Jimmy Reed                           |1959|0.42857142857142855|
|They Can't Take That Away From Me|Ella Fitzgerald / Louis Armstrong    |1956|0.42857142857142855|
+---------------------------------+-------------------------------------+----+-------------------+
only showing top 10 rows

In [21]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_40s.
+-------------------------------+----------------------------------------------------+----+-------------------+
|title                          |artist_name                                         |year|jaccard_similarity |
+-------------------------------+----------------------------------------------------+----+-------------------+
|Whoopin' the Blues             |Sonny Terry                                         |1945|0.42857142857142855|
|Aberdeen Mississippi Blues     |Bukka White                                         |1940|0.25               |
|Manteca                        |Dizzy Gillespie & His Orchestra;Luciano "Chano" Pozo|1948|0.25               |
|The Fat Man                    |Fats Domino                                         |1949|0.25               |
|That's All Right               |Arthur "Big Boy" Crudup                             |1946|0.25               |
|Muskrat                        |Merle Travis                                        |1947|0.25               |
|Dig This Boogie                |Wynonie Harris                                      |1946|0.25               |
|Deacon Jones                   |Louis Jordan and his Tympany Five                   |1943|0.25               |
|Parchman Farm Blues            |Bukka White                                         |1940|0.25               |
|I Dreamed of an Old Love Affair|Jimmie Davis                                        |1942|0.25               |
+-------------------------------+----------------------------------------------------+----+-------------------+
only showing top 10 rows

In [22]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_30s.
+---------------------------+------------------------------------+----+-------------------+
|title                      |artist_name                         |year|jaccard_similarity |
+---------------------------+------------------------------------+----+-------------------+
|Blue Lou                   |Fletcher Henderson And His Orchestra|1936|0.42857142857142855|
|Dry Well Blues             |Charley Patton                      |1931|0.42857142857142855|
|Tuxedo Junction            |Erskine Hawkins and His Orchestra   |1939|0.25               |
|Turpentine Blues           |Tampa Red                           |1932|0.25               |
|Moon Going Down            |Charley Patton                      |1930|0.25               |
|Police Station Blues       |Peetie Wheatstraw                   |1932|0.25               |
|Rootin' Ground Hog         |Big Joe Williams                    |1937|0.1111111111111111 |
|When Your Way Gets Dark    |Charlie Patton                      |1930|0.1111111111111111 |
|Bear Cat's Kitten          |Tampa Red                           |1930|0.1111111111111111 |
|Sitting On Top Of The World|Mississippi Sheiks                  |1930|0.1111111111111111 |
+---------------------------+------------------------------------+----+-------------------+
only showing top 10 rows

In [23]:
find_similar_songs(songchoice_id, 'dec_20s', 10)
Song of choice:  Billie Jean
Artist:  Michael Jackson
Year:  1982

Your next playlist includes 10 songs from decade dec_20s.
+-----------------------------------------+---------------------+----+------------------+
|title                                    |artist_name          |year|jaccard_similarity|
+-----------------------------------------+---------------------+----+------------------+
|Ain't misbehavin'                        |Fats Waller          |1929|0.25              |
|Down The Dirt Road Blues                 |Charley Patton       |1929|0.25              |
|Just Because                             |Nelstone´s Hawaiians |1929|0.25              |
|Corn Liquor Blues                        |Papa Charlie Jackson |1929|0.25              |
|Broke And Hungry                         |Blind Lemon Jefferson|1927|0.1111111111111111|
|West Coast Blues                         |Blind Blake          |1926|0.1111111111111111|
|Judge Harsh Blues  (Alternate take)      |Furry Lewis          |1928|0.1111111111111111|
|The Prisoner's Song                      |Vernon Dalhart       |1924|0.1111111111111111|
|Nobody Knows You When You're Down And Out|Bessie Smith         |1929|0.1111111111111111|
|Henry Ford Blues                         |Roosevelt Sykes      |1929|0.1111111111111111|
+-----------------------------------------+---------------------+----+------------------+
only showing top 10 rows

Gravity by Sara Bareilles

Based on the results, Gravity released oin 2004 is similar to Adia released in 1997 by Sarah Machlaclan. As confirmed by one of the authors, the 2 singers are among her preferences due to voice and genre that is why she listens to both their songs. The song I Am Woman of Helen Reddy released in 1972 sounds louder than Gravity by the author still liked the song's tempo and key.

In [24]:
songchoice_id = 'SONZPPA12AF72A9E13'
find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 20 songs from decade dec_2k1s.
+-------------------------------------------+------------------+----+-------------------+
|title                                      |artist_name       |year|jaccard_similarity |
+-------------------------------------------+------------------+----+-------------------+
|This Is It                                 |Inspectah Deck    |2010|0.6666666666666666 |
|Enough Is Enough                           |Babylonia         |2010|0.6666666666666666 |
|Hymne à Québec                             |Loco Locass       |2010|0.6666666666666666 |
|Take Over The World                        |The Courteeners   |2010|0.6666666666666666 |
|Warm Water                                 |Lucky Soul        |2010|0.6666666666666666 |
|The Twentieth Century Is Almost Over       |Johnny Cash       |2010|0.6666666666666666 |
|Zoom In Fantasize                          |Jahcoozi          |2010|0.6666666666666666 |
|Miss Universum                             |Pariisin Kevät    |2010|0.6666666666666666 |
|Starting                                   |Matt Pond PA      |2010|0.6666666666666666 |
|Long Looks                                 |Cale Parks        |2010|0.6666666666666666 |
|Keys                                       |Kim Richey        |2010|0.6666666666666666 |
|Principal                                  |Mark Isham        |2010|0.6666666666666666 |
|Out Of Africa                              |Angelique Kidjo   |2010|0.6666666666666666 |
|Empty My Hands                             |Tenth Avenue North|2010|0.6666666666666666 |
|Space Rocket - Hit The Iras (Piano Version)|Ira Atari & Rampue|2010|0.42857142857142855|
|Find My Way Back                           |Four Year Strong  |2010|0.42857142857142855|
|Keep The Tension On                        |Gowan             |2010|0.42857142857142855|
|Calendar Girls                             |U-N-I             |2010|0.42857142857142855|
|La Noche De Que Te Hablé                   |Celtas Cortos     |2010|0.42857142857142855|
|Virgin Witch (Album Version)               |Rob Zombie        |2010|0.42857142857142855|
+-------------------------------------------+------------------+----+-------------------+
only showing top 20 rows

In [25]:
find_similar_songs(songchoice_id, 'dec_2ks', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_2ks.
+-----------------------------------------------------------------------------------+-------------------+----+------------------+
|title                                                                              |artist_name        |year|jaccard_similarity|
+-----------------------------------------------------------------------------------+-------------------+----+------------------+
|Dona Cila                                                                          |Maria Gadú         |2009|1.0               |
|Caltone Special                                                                    |Tommy McCook       |2004|1.0               |
|Medicine Dance (World)                                                             |Burning Sky        |2002|1.0               |
|Untitled                                                                           |Simian             |2000|1.0               |
|But California                                                                     |Eg                 |2009|1.0               |
|¿Adónde Fue Cecilia?                                                               |Kany Garcia        |2007|1.0               |
|This Land Is Your Land                                                             |Cisco Houston      |2002|1.0               |
|Holy City/ Bayete (Medley)                                                         |Soweto Gospel Choir|2003|1.0               |
|The Fruit That Ate Itself (Bonus Track (Emboldened Navigator And The Seagull Dots))|Frog Eyes          |2003|1.0               |
|Georgia On My Mind (Album Version)                                                 |Michael Bublé      |2009|1.0               |
+-----------------------------------------------------------------------------------+-------------------+----+------------------+
only showing top 10 rows

In [26]:
find_similar_songs(songchoice_id, 'dec_90s', 20)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 20 songs from decade dec_90s.
+--------------------------------+-------------------------------+----+------------------+
|title                           |artist_name                    |year|jaccard_similarity|
+--------------------------------+-------------------------------+----+------------------+
|Caroline_ No                    |Brian Wilson                   |1995|1.0               |
|Letter From A Concerned Follower|Pedro The Lion                 |1999|1.0               |
|I Hate                          |Urinals                        |1997|1.0               |
|Bal Paré                        |Bal Paré                       |1994|1.0               |
|Ice Water                       |Cat Power                      |1996|1.0               |
|Adia                            |Sarah McLachlan                |1997|1.0               |
|Call Me Steam                   |Jeremy Enigk                   |1996|1.0               |
|Goodbye                         |Benny Goodman and His Orchestra|1992|1.0               |
|Mari Madalenas                  |Platero Y Tu                   |1996|1.0               |
|Faux semblant                   |Edith Lefel                    |1992|1.0               |
|Kappes Und Kohl                 |Hanns Dieter Hüsch             |1998|1.0               |
|Hodge Podge                     |Johnny Hodges                  |1994|1.0               |
|Shamrock Waltz                  |Nathan Abshire                 |1993|1.0               |
|Lady Bird                       |Fats Navarro                   |1996|1.0               |
|I Can't Believe It's Not Better |The Lucksmiths                 |1999|1.0               |
|Only For You                    |The Outlaws                    |1994|1.0               |
|Fonn Mharta                     |Clannad                        |1996|1.0               |
|Now Or Never                    |Lisa Ekdahl                    |1998|1.0               |
|Shelter                         |Poor Rich Ones                 |1996|1.0               |
|Come Together                   |Paul Weller & Friends          |1995|1.0               |
+--------------------------------+-------------------------------+----+------------------+
only showing top 20 rows

In [27]:
find_similar_songs(songchoice_id, 'dec_80s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_80s.
+--------------------------------------+---------------------+----+------------------+
|title                                 |artist_name          |year|jaccard_similarity|
+--------------------------------------+---------------------+----+------------------+
|Boys And Girls (2003 Digital Remaster)|The Human League     |1980|1.0               |
|A Heart Disease Called Love           |John Cooper Clarke   |1982|1.0               |
|Robotic Reggae                        |Tippa Irie           |1986|1.0               |
|My Way                                |Elvis Presley        |1989|1.0               |
|Everyday (I Have The Blues)           |BB King              |1988|1.0               |
|Rio Greyhound (Instrumental)          |Stan Ridgway         |1986|0.6666666666666666|
|Silly Girl                            |Descendents          |1985|0.6666666666666666|
|Flesh Eater                           |Sunglasses After Dark|1984|0.6666666666666666|
|Spion                                 |Udo Lindenberg       |1986|0.6666666666666666|
|Just Like Love                        |Winter Hours         |1989|0.6666666666666666|
+--------------------------------------+---------------------+----+------------------+
only showing top 10 rows

In [28]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_70s.
+------------------------------+-------------------+----+------------------+
|title                         |artist_name        |year|jaccard_similarity|
+------------------------------+-------------------+----+------------------+
|I Don't Know You Anymore      |Tony Orlando & Dawn|1973|1.0               |
|Two Of Us                     |Joe Vitale         |1974|1.0               |
|Weekend In New England        |Barry Manilow      |1976|1.0               |
|I Am Woman                    |Helen Reddy        |1972|1.0               |
|Factory                       |Bruce Springsteen  |1978|1.0               |
|Creation of Love              |The Whispers       |1973|1.0               |
|Blackmail                     |The Runaways       |1976|1.0               |
|Take Him (You Can Have My Man)|Jean Knight        |1971|1.0               |
|Only In Your Heart            |America            |1972|1.0               |
|Knocks Me Off My Feet         |Stevie Wonder      |1976|0.6666666666666666|
+------------------------------+-------------------+----+------------------+
only showing top 10 rows

In [29]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_60s.
+--------------------------------+------------------------+----+------------------+
|title                           |artist_name             |year|jaccard_similarity|
+--------------------------------+------------------------+----+------------------+
|Secret Love                     |Marvin Gaye / Kim Weston|1966|1.0               |
|Big Bad John                    |Jimmy Dean              |1961|1.0               |
|Countdown For Blofeld           |John Barry              |1967|1.0               |
|Portobello Road                 |Cat Stevens             |1966|0.6666666666666666|
|Sanguine                        |Yves Montand            |1962|0.6666666666666666|
|Born A Woman                    |Sandy Posey             |1966|0.6666666666666666|
|Hearts Like Ours (Album Version)|Connie Smith            |1965|0.6666666666666666|
|I'm A Man                       |The Yardbirds           |1964|0.6666666666666666|
|Back In My Arms Again           |The Supremes            |1965|0.6666666666666666|
|Dedicated To the One I Love     |The Shirelles           |1960|0.6666666666666666|
+--------------------------------+------------------------+----+------------------+
only showing top 10 rows

In [30]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_50s.
+-------------------------------+-------------------------------+----+------------------+
|title                          |artist_name                    |year|jaccard_similarity|
+-------------------------------+-------------------------------+----+------------------+
|Rebel Rouser                   |Duane Eddy                     |1958|0.6666666666666666|
|KC Loving                      |Little Willie Littlefield      |1952|0.6666666666666666|
|Mr. Lee                        |The Bobbettes                  |1957|0.6666666666666666|
|Return To Paradise             |Martin Denny                   |1956|0.6666666666666666|
|Whirlaway                      |Allen Toussaint                |1958|0.6666666666666666|
|I Let A Song Go Out Of My Heart|Toots Thielemans               |1955|0.6666666666666666|
|I'm Not A Know-It-All          |Frankie Lymon And The Teenagers|1956|0.6666666666666666|
|The Girl Can't Help It         |Little Richard                 |1956|0.6666666666666666|
|Whose Shoulder Will You Cry On |Kitty Wells                    |1956|0.6666666666666666|
|Stella By Starlight            |The Three Suns                 |1956|0.6666666666666666|
+-------------------------------+-------------------------------+----+------------------+
only showing top 10 rows

In [31]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_40s.
+----------------------------------+----------------------------------------------------+----+-------------------+
|title                             |artist_name                                         |year|jaccard_similarity |
+----------------------------------+----------------------------------------------------+----+-------------------+
|I Love You Because                |Leon Payne                                          |1949|1.0                |
|I Still Get a Thrill              |Harry Belafonte                                     |1949|0.6666666666666666 |
|Strange Things Happening Every Day|Sister Rosetta Tharpe                               |1945|0.6666666666666666 |
|God Bless The Child               |Billie Holiday                                      |1941|0.6666666666666666 |
|Good Morning Heartache            |Billie Holiday                                      |1946|0.6666666666666666 |
|Embraceable You                   |Frank Sinatra                                       |1947|0.42857142857142855|
|Evidence                          |Thelonious Monk                                     |1947|0.42857142857142855|
|Manteca                           |Dizzy Gillespie & His Orchestra;Luciano "Chano" Pozo|1948|0.42857142857142855|
|The Fat Man                       |Fats Domino                                         |1949|0.42857142857142855|
|Tennessee Saturday Night          |Red Foley                                           |1948|0.42857142857142855|
+----------------------------------+----------------------------------------------------+----+-------------------+
only showing top 10 rows

In [32]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_30s.
+---------------------------+------------------------------------+----+-------------------+
|title                      |artist_name                         |year|jaccard_similarity |
+---------------------------+------------------------------------+----+-------------------+
|Sitting On Top Of The World|Mississippi Sheiks                  |1930|0.6666666666666666 |
|Turpentine Blues           |Tampa Red                           |1932|0.6666666666666666 |
|Milk Cow Blues             |Sleepy John Estes                   |1930|0.6666666666666666 |
|Cocaine Habit Blues        |Memphis Jug Band                    |1930|0.42857142857142855|
|Tuxedo Junction            |Erskine Hawkins and His Orchestra   |1939|0.42857142857142855|
|Blue Lou                   |Fletcher Henderson And His Orchestra|1936|0.42857142857142855|
|Jeepers creepers           |Louis Armstrong                     |1939|0.42857142857142855|
|Tuxedo Junction            |Erskine Hawkins & His Orchestra     |1939|0.42857142857142855|
|Rootin' Ground Hog         |Big Joe Williams                    |1937|0.25               |
|Love Is the Thing          |Ethel Waters                        |1933|0.25               |
+---------------------------+------------------------------------+----+-------------------+
only showing top 10 rows

In [33]:
find_similar_songs(songchoice_id, 'dec_20s', 10)
Song of choice:  Gravity
Artist:  Sara Bareilles
Year:  2004

Your next playlist includes 10 songs from decade dec_20s.
+-----------------------------------------+---------------------+----+-------------------+
|title                                    |artist_name          |year|jaccard_similarity |
+-----------------------------------------+---------------------+----+-------------------+
|Nobody Knows You When You're Down And Out|Bessie Smith         |1929|0.6666666666666666 |
|Ain't misbehavin'                        |Fats Waller          |1929|0.42857142857142855|
|He's Got Me Goin'                        |Bessie Smith         |1929|0.42857142857142855|
|Love Changing Blues                      |Blind Willie McTell  |1929|0.42857142857142855|
|Broke And Hungry                         |Blind Lemon Jefferson|1927|0.25               |
|Tailor Made Lover                        |Papa Charlie Jackson |1929|0.25               |
|Memphis Yo Yo Blues                      |Memphis Jug Band     |1929|0.25               |
|The Prisoner's Song                      |Vernon Dalhart       |1924|0.25               |
|Down The Dirt Road Blues                 |Charley Patton       |1929|0.25               |
|Just Because                             |Nelstone´s Hawaiians |1929|0.25               |
+-----------------------------------------+---------------------+----+-------------------+
only showing top 10 rows

Que Sera Sera (Whatever Will Be_ Will Be) by Doris Day

In [49]:
songchoice_id = 'SORXLWS12AB01866F3'
find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 20 songs from decade dec_2k1s.
+---------------------------------------------+-------------------+----+-------------------+
|title                                        |artist_name        |year|jaccard_similarity |
+---------------------------------------------+-------------------+----+-------------------+
|Aquiles por su talon es Aquiles              |Jorge Drexler      |2010|1.0                |
|Wants Out                                    |Martin Sexton      |2010|1.0                |
|Ooh Boy                                      |Sleepy Sun         |2010|0.6666666666666666 |
|Warm Water                                   |Lucky Soul         |2010|0.6666666666666666 |
|Gallop (Demo)                                |Adam Green         |2010|0.6666666666666666 |
|Back In The Night (Live)                     |Dr Feelgood        |2010|0.42857142857142855|
|Everything To Me                             |Monica             |2010|0.42857142857142855|
|Rigamaroo                                    |Sleepy Sun         |2010|0.42857142857142855|
|Enough Is Enough                             |Babylonia          |2010|0.42857142857142855|
|Nanny Explains The Rules                     |James Newton Howard|2010|0.42857142857142855|
|Now We're Gone                               |Kings Go Forth     |2010|0.42857142857142855|
|'til That Day Was Gone                       |Thom Hell          |2010|0.42857142857142855|
|Tuo Fei Lun                                  |Eason Chan         |2010|0.42857142857142855|
|Q&A - Sucking 1000 Dicks In Front Of Your Mom|Joe Rogan          |2010|0.42857142857142855|
|Fireball                                     |Tony Sly           |2010|0.42857142857142855|
|More Than Worthless                          |Drowning Pool      |2010|0.42857142857142855|
|Zoom In Fantasize                            |Jahcoozi           |2010|0.42857142857142855|
|Break The Spell                              |Gogol Bordello     |2010|0.42857142857142855|
|Wildcat Fights                               |Eyeless In Gaza    |2010|0.42857142857142855|
|The Shadow Of An Empire                      |Fionn Regan        |2010|0.42857142857142855|
+---------------------------------------------+-------------------+----+-------------------+
only showing top 20 rows

In [50]:
find_similar_songs(songchoice_id, 'dec_2ks', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_2ks.
+----------------------+------------------------+----+------------------+
|title                 |artist_name             |year|jaccard_similarity|
+----------------------+------------------------+----+------------------+
|We Will Be Apart      |Bodies Of Water         |2007|1.0               |
|A Death Waltz         |Jay Brannan             |2008|1.0               |
|Nibbles               |Jacob Fred Jazz Odyssey |2003|1.0               |
|Chacarera De Los Gatos|María Elena Walsh       |2005|1.0               |
|These Roses           |Gin Wigmore             |2008|1.0               |
|Kwiaty                |Grzegorz Turnau         |2005|1.0               |
|Appleworm             |Black Moth Super Rainbow|2009|1.0               |
|Whispering Pines      |Jakob Dylan             |2007|1.0               |
|Boss Eye              |Milanese                |2006|0.6666666666666666|
|The Heart Worships    |Hayley Westenra         |2007|0.6666666666666666|
+----------------------+------------------------+----+------------------+
only showing top 10 rows

In [51]:
find_similar_songs(songchoice_id, 'dec_90s', 20)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 20 songs from decade dec_90s.
+----------------------------------------------+-------------------------------------+----+------------------+
|title                                         |artist_name                          |year|jaccard_similarity|
+----------------------------------------------+-------------------------------------+----+------------------+
|Hyazinthen                                    |Jürgen von der Lippe                 |1999|1.0               |
|Happy (Love Theme From "Lady Sings The Blues")|Michael Jackson                      |1992|1.0               |
|That Hospital                                 |Loudon Wainwright III                |1995|1.0               |
|The Dirt of the Vineyard                      |Cursive                              |1997|1.0               |
|Smile                                         |Laura Nyro                           |1997|1.0               |
|Melsie                                        |Legs on Earth                        |1999|1.0               |
|The Event Horizon                             |Air Miami                            |1994|1.0               |
|Daddy                                         |Donna Fargo                          |1997|1.0               |
|Stading Room Only_ Mr. Mars                   |Reign Ghost                          |1990|0.6666666666666666|
|Nakasaki (I Need A Lover Tonight)             |Ken Doh                              |1996|0.6666666666666666|
|Off With His Cardigan!                        |The Lucksmiths                       |1996|0.6666666666666666|
|Forfeit Trials                                |The Harvest Ministers                |1993|0.6666666666666666|
|Enchanted                                     |Book Of Love                         |1993|0.6666666666666666|
|Gone Home                                     |Stevie Ray Vaughan And Double Trouble|1996|0.6666666666666666|
|It's Only Make Believe                        |Billy Fury                           |1990|0.6666666666666666|
|On The Run (Album Version)                    |Manfred Mann's Earth Band            |1996|0.6666666666666666|
|Your Thoughts And Mine                        |Tarnation                            |1997|0.6666666666666666|
|Anyway                                        |Nichole Nordeman                     |1998|0.6666666666666666|
|JJ Slow                                       |Chokebore                            |1995|0.6666666666666666|
|Worlds                                        |Steve Roach                          |1992|0.6666666666666666|
+----------------------------------------------+-------------------------------------+----+------------------+
only showing top 20 rows

In [52]:
find_similar_songs(songchoice_id, 'dec_80s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_80s.
+---------------------------+-------------------------+----+------------------+
|title                      |artist_name              |year|jaccard_similarity|
+---------------------------+-------------------------+----+------------------+
|My Love                    |Rick James               |1982|1.0               |
|Old Rock 'N Roller         |The Charlie Daniels Band |1989|1.0               |
|Pretty Africa              |Desmond Dekker           |1980|0.6666666666666666|
|Sister Fate (LP Version)   |Sheila E                 |1985|0.6666666666666666|
|Newgrange                  |Clannad                  |1983|0.6666666666666666|
|Critical List              |Fleshtones               |1982|0.6666666666666666|
|For Billie (LP Version)    |Julius Hemphill          |1988|0.6666666666666666|
|On The Run (Album Version) |Manfred Mann's Earth Band|1980|0.6666666666666666|
|Find My Love               |Fairground Attraction    |1988|0.6666666666666666|
|I'll Never Need Anyone More|Michael Stanley Band     |1980|0.6666666666666666|
+---------------------------+-------------------------+----+------------------+
only showing top 10 rows

In [53]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_70s.
+---------------------------------------+-------------------+----+------------------+
|title                                  |artist_name        |year|jaccard_similarity|
+---------------------------------------+-------------------+----+------------------+
|Where Are You                          |The Main Ingredient|1972|1.0               |
|A Really Good Time                     |Roxy Music         |1974|0.6666666666666666|
|If You Go Away                         |Terry Jacks        |1974|0.6666666666666666|
|Country Side Of Life                   |Wet Willie         |1974|0.6666666666666666|
|Behind The Wall Of Sleep               |Black Sabbath      |1970|0.6666666666666666|
|Love Of My Life (1993 Digital Remaster)|Queen              |1975|0.6666666666666666|
|Barefootin' (LP Version)               |Brownsville Station|1973|0.6666666666666666|
|Lincoln Freed Me Today (The Slave)     |Joan Baez          |1971|0.6666666666666666|
|Vagabond                               |Ferris             |1971|0.6666666666666666|
|Make Love To Your Mind                 |Bill Withers       |1975|0.6666666666666666|
+---------------------------------------+-------------------+----+------------------+
only showing top 10 rows

In [54]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_60s.
+----------------------------------+-------------------------------+----+------------------+
|title                             |artist_name                    |year|jaccard_similarity|
+----------------------------------+-------------------------------+----+------------------+
|The Lost Soul                     |The Doc Watson Family          |1963|1.0               |
|Quero Esquecer Você               |Jorge Ben                      |1963|1.0               |
|Something Fishy                   |Dolly Parton                   |1967|0.6666666666666666|
|Better Use Your Head              |Little Anthony & The Imperials |1966|0.6666666666666666|
|Locking Up My Heart               |The Marvelettes                |1963|0.6666666666666666|
|Hub Caps And Tail Lights          |Henry Mancini & His Orchestra  |1961|0.6666666666666666|
|Every Little Bit of Love          |Spirits and Worm               |1969|0.6666666666666666|
|If You See My Baby (2001 Remaster)|Merle Haggard And The Strangers|1968|0.6666666666666666|
|Got Love If You Want It           |The Yardbirds                  |1964|0.6666666666666666|
|Song About The Rain               |The Stone Poneys               |1967|0.6666666666666666|
+----------------------------------+-------------------------------+----+------------------+
only showing top 10 rows

In [55]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_50s.
+------------------------+-----------------+----+------------------+
|title                   |artist_name      |year|jaccard_similarity|
+------------------------+-----------------+----+------------------+
|That's All Right        |Jimmy Rodgers    |1950|1.0               |
|See See Rider           |Lightnin' Hopkins|1955|1.0               |
|Invitation              |Les Baxter       |1955|0.6666666666666666|
|A Little Bench Of Rushes|Seamus Ennis     |1958|0.6666666666666666|
|Blue Moon               |Elvis Presley    |1956|0.6666666666666666|
|Blue Prelude            |Nina Simone      |1959|0.6666666666666666|
|Loneliness Of Evening   |Percy Faith      |1958|0.6666666666666666|
|Chicken Talk            |Yma Sumac        |1954|0.6666666666666666|
|Temperature             |Little Walter    |1957|0.6666666666666666|
|Little Girl             |Ritchie Valens   |1959|0.6666666666666666|
+------------------------+-----------------+----+------------------+
only showing top 10 rows

In [56]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_40s.
+-------------------------------+-----------------------+----+-------------------+
|title                          |artist_name            |year|jaccard_similarity |
+-------------------------------+-----------------------+----+-------------------+
|Chicago Blues                  |Arthur Crudup          |1948|0.6666666666666666 |
|Don't You Lie To Me            |Tampa Red              |1941|0.42857142857142855|
|T-Bone Shuffle                 |T-Bone Walker          |1949|0.42857142857142855|
|Deep As The River              |Harry Belafonte        |1949|0.42857142857142855|
|I Love You Because             |Leon Payne             |1949|0.42857142857142855|
|So In Love                     |Cole Porter            |1949|0.42857142857142855|
|Sixteen Tons                   |Merle Travis           |1947|0.42857142857142855|
|Boy Friend Blues               |Arthur "Big Boy" Crudup|1948|0.42857142857142855|
|Good Morning Heartache         |Billie Holiday         |1946|0.42857142857142855|
|I Dreamed of an Old Love Affair|Jimmie Davis           |1942|0.42857142857142855|
+-------------------------------+-----------------------+----+-------------------+
only showing top 10 rows

In [57]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_30s.
+---------------------------+------------------------------------+----+-------------------+
|title                      |artist_name                         |year|jaccard_similarity |
+---------------------------+------------------------------------+----+-------------------+
|Milk Cow Blues             |Sleepy John Estes                   |1930|0.6666666666666666 |
|Cocaine Habit Blues        |Memphis Jug Band                    |1930|0.42857142857142855|
|Blue Lou                   |Fletcher Henderson And His Orchestra|1936|0.42857142857142855|
|Sitting On Top Of The World|Mississippi Sheiks                  |1930|0.42857142857142855|
|Moon Going Down            |Charley Patton                      |1930|0.42857142857142855|
|Milk Cow Blues             |Kokomo Arnold                       |1934|0.42857142857142855|
|Rootin' Ground Hog         |Big Joe Williams                    |1937|0.25               |
|When Your Way Gets Dark    |Charlie Patton                      |1930|0.25               |
|Bear Cat's Kitten          |Tampa Red                           |1930|0.25               |
|Love Is the Thing          |Ethel Waters                        |1933|0.25               |
+---------------------------+------------------------------------+----+-------------------+
only showing top 10 rows

In [58]:
find_similar_songs(songchoice_id, 'dec_20s', 10)
Song of choice:  Que Sera_ Sera (Whatever Will Be_ Will Be)
Artist:  Doris Day
Year:  0

Your next playlist includes 10 songs from decade dec_20s.
+-----------------------------------------+----------------------+----+-------------------+
|title                                    |artist_name           |year|jaccard_similarity |
+-----------------------------------------+----------------------+----+-------------------+
|Tailor Made Lover                        |Papa Charlie Jackson  |1929|0.42857142857142855|
|Memphis Yo Yo Blues                      |Memphis Jug Band      |1929|0.42857142857142855|
|Nobody Knows You When You're Down And Out|Bessie Smith          |1929|0.42857142857142855|
|Love Changing Blues                      |Blind Willie McTell   |1929|0.42857142857142855|
|West Coast Blues                         |Blind Blake           |1926|0.25               |
|Goin' Places                             |Joe Venuti_ Eddie Lang|1927|0.25               |
|Ain't misbehavin'                        |Fats Waller           |1929|0.25               |
|Bedtime Blues                            |Frank Stokes          |1928|0.25               |
|He's Got Me Goin'                        |Bessie Smith          |1929|0.25               |
|Broke And Hungry                         |Blind Lemon Jefferson |1927|0.1111111111111111 |
+-----------------------------------------+----------------------+----+-------------------+
only showing top 10 rows

Can't Help Falling In Love by Elvis Presley

In [69]:
songchoice_id = 'SOBLILW12A8C143D33'
find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 20 songs from decade dec_2k1s.
+----------------------+---------------------------------+----+------------------+
|title                 |artist_name                      |year|jaccard_similarity|
+----------------------+---------------------------------+----+------------------+
|Ubbidirò              |Biagio Antonacci                 |2010|0.6666666666666666|
|Whoop and Hollar      |Ray Wylie Hubbard                |2010|0.6666666666666666|
|Reflection            |Lemongrass feat. Karen Gibson Roc|2010|0.6666666666666666|
|Siempre Igual         |Los Autenticos Decadentes        |2010|0.6666666666666666|
|Telegram              |Strong Arm Steady                |2010|0.6666666666666666|
|The Meaning of Life   |Vargo                            |2010|0.6666666666666666|
|Las Vegas             |Oddisee                          |2010|0.6666666666666666|
|Day N Nite (Accapella)|Crookers                         |2010|0.6666666666666666|
|Choosing Numbers      |Field Music                      |2010|0.6666666666666666|
|Kiss Her Goodbye      |The Golden Filter                |2010|0.6666666666666666|
|American Boom         |The Wave Pictures                |2010|0.6666666666666666|
|Bored Games           |Wild Nothing                     |2010|0.6666666666666666|
|Drained Out           |Audio Bullys                     |2010|0.6666666666666666|
|Lately feat. Miguel   |U-N-I                            |2010|0.6666666666666666|
|Smile                 |Strong Arm Steady                |2010|0.6666666666666666|
|Oceanliner            |Madrugada                        |2010|0.6666666666666666|
|Dark Rain             |The Beauty of Gemina             |2010|0.6666666666666666|
|A Key Turns           |The Depreciation Guild           |2010|0.6666666666666666|
|Bomba nel cuore       |Il pan del diavolo               |2010|0.6666666666666666|
|Nolla                 |Anssi Kela                       |2010|0.6666666666666666|
+----------------------+---------------------------------+----+------------------+
only showing top 20 rows

In [70]:
find_similar_songs(songchoice_id, 'dec_2ks', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_2ks.
+-------------------------------------------+--------------------+----+------------------+
|title                                      |artist_name         |year|jaccard_similarity|
+-------------------------------------------+--------------------+----+------------------+
|A Bir                                      |Kurban              |2005|1.0               |
|Drive By                                   |Oh No               |2009|1.0               |
|Accordian                                  |Madvillain          |2004|1.0               |
|Truth In Your Words                        |They Might Be Giants|2001|1.0               |
|L'Arrestation (Instrumental)               |Le Roi Soleil       |2006|1.0               |
|SpongeBob SquarePants Theme (Movie Version)|The Pirates         |2004|1.0               |
|Intro                                      |Rockin' Da North    |2002|1.0               |
|No Choice - Just Suck (Part 1)             |Micropoint          |2000|1.0               |
|Teddybears Live 'n' Direct                 |Teddybears Sthlm    |2000|1.0               |
|The Fence Feels Its Post                   |Frog Eyes           |2004|1.0               |
+-------------------------------------------+--------------------+----+------------------+
only showing top 10 rows

In [71]:
find_similar_songs(songchoice_id, 'dec_90s', 20)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 20 songs from decade dec_90s.
+----------------------------+-------------------------+----+------------------+
|title                       |artist_name              |year|jaccard_similarity|
+----------------------------+-------------------------+----+------------------+
|Soneranputsausjenkka        |Jope Ruonansuu           |1999|1.0               |
|In Heaven                   |Pixies                   |1991|1.0               |
|Blue Eyes (Don't Run Away)  |Link Wray                |1990|1.0               |
|Milk                        |Tony Mason-Cox           |1990|1.0               |
|Tanzen                      |Paso Doble               |1997|0.6666666666666666|
|Corazón                     |Los Auténticos Decadentes|1995|0.6666666666666666|
|Do You Really Want Me       |Jon Secada               |1992|0.6666666666666666|
|I Kissed A Girl (LP Version)|Jill Sobule              |1995|0.6666666666666666|
|Can't Pretend               |Beatnik Termites         |1999|0.6666666666666666|
|Mr Happy Reveller           |Flowered Up              |1991|0.6666666666666666|
|Dejandonos Caer             |7 Notas 7 Colores        |1999|0.6666666666666666|
|Michaelmas                  |Geneva                   |1997|0.6666666666666666|
|Intro                       |5th Ward Boyz            |1993|0.6666666666666666|
|Crawl Back Home             |The Pietasters           |1999|0.6666666666666666|
|Circles                     |Soul Coughing            |1998|0.6666666666666666|
|Henry Rollins Is No Fun     |Chixdiggit!              |1996|0.6666666666666666|
|Brillante sobre el mic      |Fito Paez                |1992|0.6666666666666666|
|Tania                       |Fruko Y Sus Tesos        |1995|0.6666666666666666|
|The Ascension               |Nick Glennie-Smith       |1998|0.6666666666666666|
|Smoked Oak                  |David Holmes             |1995|0.6666666666666666|
+----------------------------+-------------------------+----+------------------+
only showing top 20 rows

In [72]:
find_similar_songs(songchoice_id, 'dec_80s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_80s.
+------------------------------------+--------------------------------+----+------------------+
|title                               |artist_name                     |year|jaccard_similarity|
+------------------------------------+--------------------------------+----+------------------+
|Survival Of The Streets             |Cro-Mags                        |1986|1.0               |
|Over The River And Through The Woods|The Chipmunks With David Seville|1980|1.0               |
|Mr. Disco                           |New Order                       |1989|0.6666666666666666|
|Skápate                             |Desorden Público                |1988|0.6666666666666666|
|Leuchtturm                          |Nena                            |1983|0.6666666666666666|
|Sangre                              |Angeles del Infierno            |1984|0.6666666666666666|
|Festival Of Colours                 |The Creatures                   |1983|0.6666666666666666|
|Don't Knock Upon My Door            |Billy Fury                      |1988|0.6666666666666666|
|Hate Breeders (Live)                |The Misfits                     |1982|0.6666666666666666|
|Spin                                |The Darling Buds                |1988|0.6666666666666666|
+------------------------------------+--------------------------------+----+------------------+
only showing top 10 rows

In [73]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_70s.
+--------------------------------------+---------------------+----+------------------+
|title                                 |artist_name          |year|jaccard_similarity|
+--------------------------------------+---------------------+----+------------------+
|At Last I Am Free                     |Chic                 |1978|1.0               |
|Baia                                  |Idris Muhammad       |1975|0.6666666666666666|
|Shake A Hand (If You Can)             |Little Richard       |1971|0.6666666666666666|
|Pay To The Piper                      |Chairmen Of The Board|1970|0.6666666666666666|
|All Night Long                        |Dexter Wansel        |1978|0.6666666666666666|
|Baby Don't Go                         |Karla Bonoff         |1979|0.6666666666666666|
|It's Only Love (1999 Digital Remaster)|Bryan Ferry          |1976|0.6666666666666666|
|It's My Party (1999 Digital Remaster) |Bryan Ferry          |1973|0.6666666666666666|
|Big green car                         |Polecats             |1979|0.6666666666666666|
|I've Got You Under My Skin            |Gloria Gaynor        |1976|0.6666666666666666|
+--------------------------------------+---------------------+----+------------------+
only showing top 10 rows

In [74]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_60s.
+--------------------------+----------------------------------------+----+------------------+
|title                     |artist_name                             |year|jaccard_similarity|
+--------------------------+----------------------------------------+----+------------------+
|Crazy thing nr 1          |Tasavallan Presidentti                  |1969|1.0               |
|Can't Help Falling In Love|Elvis Presley                           |1961|1.0               |
|Money Penny Goes For Broke|Burt Bacharach                          |1967|0.6666666666666666|
|Main Title (True Grit )   |Elmer Bernstein                         |1969|0.6666666666666666|
|Blagged                   |Peter Sarstedt                          |1969|0.6666666666666666|
|Squaws Along The Yukon    |Hank Thompson And His Brazos Valley Boys|1963|0.6666666666666666|
|Marcello Magaroni         |Irwin Goodman                           |1966|0.6666666666666666|
|Put Yourself In My Place  |The Elgins                              |1966|0.6666666666666666|
|Patricia                  |Perez Prado                             |1960|0.6666666666666666|
|It's Over                 |Glen Campbell                           |1967|0.6666666666666666|
+--------------------------+----------------------------------------+----+------------------+
only showing top 10 rows

In [75]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_50s.
+---------------------+-------------------------------+----+-------------------+
|title                |artist_name                    |year|jaccard_similarity |
+---------------------+-------------------------------+----+-------------------+
|Baby Face            |Little Richard                 |1958|0.6666666666666666 |
|The ABC's Of Love    |Frankie Lymon And The Teenagers|1956|0.6666666666666666 |
|Problems             |The Everly Brothers            |1958|0.6666666666666666 |
|September In The Rain|Julie London                   |1956|0.6666666666666666 |
|Rebel Rouser         |Duane Eddy                     |1958|0.42857142857142855|
|S'il Te Faut         |Jacques Brel                   |1954|0.42857142857142855|
|KC Loving            |Little Willie Littlefield      |1952|0.42857142857142855|
|At The Hop           |Danny & The Juniors            |1957|0.42857142857142855|
|Mr. Lee              |The Bobbettes                  |1957|0.42857142857142855|
|Whirlaway            |Allen Toussaint                |1958|0.42857142857142855|
+---------------------+-------------------------------+----+-------------------+
only showing top 10 rows

In [76]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_40s.
+----------------------------------+----------------------------------------------------+----+-------------------+
|title                             |artist_name                                         |year|jaccard_similarity |
+----------------------------------+----------------------------------------------------+----+-------------------+
|Whoopin' the Blues                |Sonny Terry                                         |1945|0.42857142857142855|
|Big Fine Girl                     |Wilbur deParis_ Jimmy Witherspoon                   |1949|0.42857142857142855|
|Old Maid Boogie                   |Eddie "Cleanhead" Vinson                            |1947|0.42857142857142855|
|Aberdeen Mississippi Blues        |Bukka White                                         |1940|0.25               |
|Manteca                           |Dizzy Gillespie & His Orchestra;Luciano "Chano" Pozo|1948|0.25               |
|The Fat Man                       |Fats Domino                                         |1949|0.25               |
|That's All Right                  |Arthur "Big Boy" Crudup                             |1946|0.25               |
|I Love You Because                |Leon Payne                                          |1949|0.25               |
|I Still Get a Thrill              |Harry Belafonte                                     |1949|0.25               |
|Strange Things Happening Every Day|Sister Rosetta Tharpe                               |1945|0.25               |
+----------------------------------+----------------------------------------------------+----+-------------------+
only showing top 10 rows

In [77]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_30s.
+---------------------------+------------------------------------+----+-------------------+
|title                      |artist_name                         |year|jaccard_similarity |
+---------------------------+------------------------------------+----+-------------------+
|Turpentine Blues           |Tampa Red                           |1932|0.42857142857142855|
|Rootin' Ground Hog         |Big Joe Williams                    |1937|0.25               |
|Blue Lou                   |Fletcher Henderson And His Orchestra|1936|0.25               |
|Police Station Blues       |Peetie Wheatstraw                   |1932|0.25               |
|Dry Well Blues             |Charley Patton                      |1931|0.25               |
|Cocaine Habit Blues        |Memphis Jug Band                    |1930|0.1111111111111111 |
|When Your Way Gets Dark    |Charlie Patton                      |1930|0.1111111111111111 |
|Bear Cat's Kitten          |Tampa Red                           |1930|0.1111111111111111 |
|Tuxedo Junction            |Erskine Hawkins and His Orchestra   |1939|0.1111111111111111 |
|Sitting On Top Of The World|Mississippi Sheiks                  |1930|0.1111111111111111 |
+---------------------------+------------------------------------+----+-------------------+
only showing top 10 rows

In [78]:
find_similar_songs(songchoice_id, 'dec_20s', 10)
Song of choice:  Can't Help Falling In Love
Artist:  Elvis Presley
Year:  1961

Your next playlist includes 10 songs from decade dec_20s.
+-----------------------------------------+---------------------+----+------------------+
|title                                    |artist_name          |year|jaccard_similarity|
+-----------------------------------------+---------------------+----+------------------+
|Ain't misbehavin'                        |Fats Waller          |1929|0.25              |
|Down The Dirt Road Blues                 |Charley Patton       |1929|0.25              |
|Just Because                             |Nelstone´s Hawaiians |1929|0.25              |
|Corn Liquor Blues                        |Papa Charlie Jackson |1929|0.25              |
|Broke And Hungry                         |Blind Lemon Jefferson|1927|0.1111111111111111|
|West Coast Blues                         |Blind Blake          |1926|0.1111111111111111|
|Judge Harsh Blues  (Alternate take)      |Furry Lewis          |1928|0.1111111111111111|
|The Prisoner's Song                      |Vernon Dalhart       |1924|0.1111111111111111|
|Nobody Knows You When You're Down And Out|Bessie Smith         |1929|0.1111111111111111|
|Henry Ford Blues                         |Roosevelt Sykes      |1929|0.1111111111111111|
+-----------------------------------------+---------------------+----+------------------+
only showing top 10 rows

Because of You by Kelly Clarkson

In [84]:
songchoice_id = 'SOUWYEZ12D0219189A'
find_similar_songs(songchoice_id, 'dec_2k1s', 20)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 20 songs from decade dec_2k1s.
+---------------------------------------+------------------+----+------------------+
|title                                  |artist_name       |year|jaccard_similarity|
+---------------------------------------+------------------+----+------------------+
|Beyond A Rock                          |Dreadzone         |2010|1.0               |
|Ragged Mile                            |John Butler Trio  |2010|1.0               |
|Mai Dimenticata                        |Valerio Scanu     |2010|1.0               |
|The He Man Woman Haters Club           |From First to Last|2010|1.0               |
|Rainbow                                |Radar Brothers    |2010|0.6666666666666666|
|Speckles Shine feat. Guillermo E. Brown|Jahcoozi          |2010|0.6666666666666666|
|La Noche De Que Te Hablé               |Celtas Cortos     |2010|0.6666666666666666|
|Just a Little Bit of Love              |Ocean Colour Scene|2010|0.6666666666666666|
|Booty Pills                            |Les Petits Pilous |2010|0.6666666666666666|
|Dancing With Girls                     |General Fiasco    |2010|0.6666666666666666|
|Sunshine                               |Macabre Unit      |2010|0.6666666666666666|
|No Doubt About It                      |Gentleman         |2010|0.6666666666666666|
|Luv Letter feat. Ms Whitney            |Inspectah Deck    |2010|0.6666666666666666|
|April Fool                             |Matt Monro        |2010|0.6666666666666666|
|Look Over Yonders Wall                 |Joe Bonamassa     |2010|0.6666666666666666|
|Warm Welcome                           |Silver Columns    |2010|0.6666666666666666|
|Miami                                  |Oddisee           |2010|0.6666666666666666|
|Lake Superior                          |Jason Collett     |2010|0.6666666666666666|
|Hit The Ground Running                 |Keel              |2010|0.6666666666666666|
|Fisherman Style                        |Captain Sinbad    |2010|0.6666666666666666|
+---------------------------------------+------------------+----+------------------+
only showing top 20 rows

In [85]:
find_similar_songs(songchoice_id, 'dec_2ks', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_2ks.
+--------------------------------+------------------------------+----+------------------+
|title                           |artist_name                   |year|jaccard_similarity|
+--------------------------------+------------------------------+----+------------------+
|Let Me Hit That                 |Three 6 Mafia feat. Boogiemane|2005|1.0               |
|Ich liebe dich                  |Nathalie Tineo                |2006|1.0               |
|It's The Heart That Matters Most|Charlotte Church              |2002|1.0               |
|Harmaata lunta                  |Gimmel                        |2003|1.0               |
|Voices (Psy'Aviah RMX)          |Helalyn Flowers               |2007|1.0               |
|Cellophane (Album Version)      |Amanda Ghost                  |2000|1.0               |
|Clementine                      |Washington                    |2009|1.0               |
|Poslednii Dyim                  |Messer Chups                  |2005|1.0               |
|Killing Time (Vix Remix)        |Whispers In The Shadow        |2009|1.0               |
|The Pressure Part 1             |Sounds Of Blackness           |2003|1.0               |
+--------------------------------+------------------------------+----+------------------+
only showing top 10 rows

In [86]:
find_similar_songs(songchoice_id, 'dec_90s', 20)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 20 songs from decade dec_90s.
+----------------------------------+---------------------------+----+------------------+
|title                             |artist_name                |year|jaccard_similarity|
+----------------------------------+---------------------------+----+------------------+
|Write It On Your Hand (LP Version)|Marvelous 3                |1998|1.0               |
|Through The Night                 |Mad Heads                  |1996|1.0               |
|Tell Me About It                  |Marshall Crenshaw          |1999|1.0               |
|Knockin' On Heaven's Door         |Selig                      |1999|1.0               |
|The Hi-De-Ho Man                  |Cab Calloway_ His Orchestra|1999|1.0               |
|LIKE A PRISONER                   |Skagarack                  |1990|1.0               |
|Mango Cool                        |Los Amigos Invisibles      |1998|1.0               |
|Barcelona (Album Version)         |The Rentals                |1999|1.0               |
|Dinosaurs (LP Version)            |King Missile               |1991|1.0               |
|Ilta yöhön kuljettaa              |Anna Eriksson              |1999|1.0               |
|Greetings                         |Yabby U                    |1994|1.0               |
|Sing Hallelujah!                  |Dr. Alban                  |1992|1.0               |
|Stay At Home (Album Version)      |Too Much Joy               |1992|1.0               |
|Anita                             |Costa Cordalis             |1991|1.0               |
|Stop The Rock                     |Apollo 440                 |1999|1.0               |
|Try                               |God Lives Underwater       |1995|1.0               |
|Believe It Or Not                 |Don Covay                  |1994|1.0               |
|Copycat                           |The Cranberries            |1999|1.0               |
|Nowhere To Hide                   |Antiloop                   |1997|1.0               |
|The Fire                          |Chely Wright               |1999|1.0               |
+----------------------------------+---------------------------+----+------------------+
only showing top 20 rows

In [87]:
find_similar_songs(songchoice_id, 'dec_80s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_80s.
+-------------------------------------+--------------+----+------------------+
|title                                |artist_name   |year|jaccard_similarity|
+-------------------------------------+--------------+----+------------------+
|At The Movies (1991 Digital Remaster)|Bad Brains    |1983|1.0               |
|Sally Brown                          |Bad Manners   |1988|1.0               |
|Talk Talk (1997 Digital Remaster)    |Talk Talk     |1982|1.0               |
|Possible Straight (Album Version)    |Lyle Mays     |1988|1.0               |
|Que Vamos A Hacer                    |Los Ronaldos  |1988|1.0               |
|Let Me Be The One                    |Angela Bofill |1984|1.0               |
|Everything I Own                     |Ken Boothe    |1987|1.0               |
|I Thought It Took A Little Time      |Stacy Lattisaw|1985|1.0               |
|Every Man Has A Right                |Cultural Roots|1984|1.0               |
|Miracle Man                          |Ozzy Osbourne |1988|1.0               |
+-------------------------------------+--------------+----+------------------+
only showing top 10 rows

In [88]:
find_similar_songs(songchoice_id, 'dec_70s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_70s.
+-----------------------------+--------------------+----+------------------+
|title                        |artist_name         |year|jaccard_similarity|
+-----------------------------+--------------------+----+------------------+
|I Play And Sing              |Tony Orlando & Dawn |1971|1.0               |
|Swaheto Woman                |David Johansen      |1979|1.0               |
|Good Lovin'                  |It's A Beautiful Day|1970|1.0               |
|Rubber Biscuit               |The Blues Brothers  |1978|1.0               |
|Guess Who's Coming To Dinner |Black Uhuru         |1979|1.0               |
|Telegram Sam                 |T-Rex               |1972|1.0               |
|In From The Storm            |Jimi Hendrix        |1971|1.0               |
|My Coo Ca Choo               |Alvin Stardust      |1974|1.0               |
|(They Are) Rollerskating     |Dolly Dots          |1979|0.6666666666666666|
|Love Really Hurts Without You|Billy Ocean         |1975|0.6666666666666666|
+-----------------------------+--------------------+----+------------------+
only showing top 10 rows

In [89]:
find_similar_songs(songchoice_id, 'dec_60s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_60s.
+---------------------------------------+-------------------+----+------------------+
|title                                  |artist_name        |year|jaccard_similarity|
+---------------------------------------+-------------------+----+------------------+
|Just Like A Rose                       |? & The Mysterians |1967|1.0               |
|Piercing the unknown                   |The Spotnicks      |1966|1.0               |
|Harpsichord Shuffle                    |Wynder K. Frog     |1968|1.0               |
|Helpless                               |Kim Weston         |1966|1.0               |
|Runaway                                |Del Shannon        |1961|1.0               |
|Scandal in a Brixton Market            |Laurel Aitken      |1969|1.0               |
|Then He Kissed Me                      |The Crystals       |1963|0.6666666666666666|
|2-4-2 Fox Trot (The Lear Jet Song)     |The Byrds          |1966|0.6666666666666666|
|Little Darling (I Need You)            |Marvin Gaye        |1966|0.6666666666666666|
|I Don't Play_ I'll Be Your Man Some Day|Charles Musselwhite|1969|0.6666666666666666|
+---------------------------------------+-------------------+----+------------------+
only showing top 10 rows

In [90]:
find_similar_songs(songchoice_id, 'dec_50s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_50s.
+------------------------------+---------------+----+------------------+
|title                         |artist_name    |year|jaccard_similarity|
+------------------------------+---------------+----+------------------+
|Old Maid                      |Big Bopper     |1959|1.0               |
|Come Go With Me               |The Del Vikings|1957|0.6666666666666666|
|Almost Grown                  |Chuck Berry    |1959|0.6666666666666666|
|How Are Ya' Fixed For Love?   |Frank Sinatra  |1958|0.6666666666666666|
|Way Down Yonder In New Orleans|Freddy Cannon  |1959|0.6666666666666666|
|Hungry for Love               |Patsy Cline    |1957|0.6666666666666666|
|Pick Me Up On Your Way Down   |Charlie Walker |1958|0.6666666666666666|
|Susie Q                       |Dale Hawkins   |1957|0.6666666666666666|
|Oh I Apologize                |Barrett Strong |1959|0.6666666666666666|
|La Bamba                      |Ritchie Valens |1958|0.6666666666666666|
+------------------------------+---------------+----+------------------+
only showing top 10 rows

In [91]:
find_similar_songs(songchoice_id, 'dec_40s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_40s.
+----------------------------------+----------------------------------------------------+----+-------------------+
|title                             |artist_name                                         |year|jaccard_similarity |
+----------------------------------+----------------------------------------------------+----+-------------------+
|Old Maid Boogie                   |Eddie "Cleanhead" Vinson                            |1947|0.6666666666666666 |
|Embraceable You                   |Frank Sinatra                                       |1947|0.42857142857142855|
|Manteca                           |Dizzy Gillespie & His Orchestra;Luciano "Chano" Pozo|1948|0.42857142857142855|
|The Fat Man                       |Fats Domino                                         |1949|0.42857142857142855|
|Tennessee Saturday Night          |Red Foley                                           |1948|0.42857142857142855|
|I Still Get a Thrill              |Harry Belafonte                                     |1949|0.42857142857142855|
|Strange Things Happening Every Day|Sister Rosetta Tharpe                               |1945|0.42857142857142855|
|Dig This Boogie                   |Wynonie Harris                                      |1946|0.42857142857142855|
|Whoopin' the Blues                |Sonny Terry                                         |1945|0.42857142857142855|
|Old Maid Boogie                   |Eddie Vinson                                        |1947|0.42857142857142855|
+----------------------------------+----------------------------------------------------+----+-------------------+
only showing top 10 rows

In [92]:
find_similar_songs(songchoice_id, 'dec_30s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_30s.
+----------------------------------+---------------------------------------+----+-------------------+
|title                             |artist_name                            |year|jaccard_similarity |
+----------------------------------+---------------------------------------+----+-------------------+
|Sitting On Top Of The World       |Mississippi Sheiks                     |1930|0.42857142857142855|
|Don't You Leave Me Here           |Jelly Roll Morton's New Orleans Jazzmen|1939|0.42857142857142855|
|Tuxedo Junction                   |Erskine Hawkins & His Orchestra        |1939|0.42857142857142855|
|Searching The Desert For The Blues|Blind Willie McTell                    |1932|0.42857142857142855|
|Cocaine Habit Blues               |Memphis Jug Band                       |1930|0.25               |
|Blue Lou                          |Fletcher Henderson And His Orchestra   |1936|0.25               |
|Jeepers creepers                  |Louis Armstrong                        |1939|0.25               |
|Turpentine Blues                  |Tampa Red                              |1932|0.25               |
|I've Got My Love To Keep Me Warm  |Billie Holiday                         |1937|0.25               |
|Old Shep                          |Red Foley                              |1936|0.25               |
+----------------------------------+---------------------------------------+----+-------------------+
only showing top 10 rows

In [93]:
find_similar_songs(songchoice_id, 'dec_20s', 10)
Song of choice:  Because Of You
Artist:  Kelly Clarkson
Year:  2004

Your next playlist includes 10 songs from decade dec_20s.
+-----------------------------------------+----------------------+----+-------------------+
|title                                    |artist_name           |year|jaccard_similarity |
+-----------------------------------------+----------------------+----+-------------------+
|Broke And Hungry                         |Blind Lemon Jefferson |1927|0.42857142857142855|
|The Prisoner's Song                      |Vernon Dalhart        |1924|0.42857142857142855|
|Goin' Places                             |Joe Venuti_ Eddie Lang|1927|0.25               |
|Nobody Knows You When You're Down And Out|Bessie Smith          |1929|0.25               |
|Ain't misbehavin'                        |Fats Waller           |1929|0.25               |
|Down The Dirt Road Blues                 |Charley Patton        |1929|0.25               |
|Bedtime Blues                            |Frank Stokes          |1928|0.25               |
|He's Got Me Goin'                        |Bessie Smith          |1929|0.25               |
|Just Because                             |Nelstone´s Hawaiians  |1929|0.25               |
|Corn Liquor Blues                        |Papa Charlie Jackson  |1929|0.25               |
+-----------------------------------------+----------------------+----+-------------------+
only showing top 10 rows

Summary of Results and Insights

Results

Insights

  • In the 100k dataset the team used, about half of it had no label on years. This really isn't an issue but it does limit the model from recommending more songs from a specific decade. A certain bias may be generated if a large portion of these unlabeled songs are of a certain genre.

  • It's interesting to see that for some reason, the recommender that uses Cosine similarity is more inclined to suggest rock music, which might be suggestive due to the bias towards rock music of our dataset, but when it comes to the recommender using Jaccard similarity, based on the tests, it is highly biased in recommending softer music. One rationale as to why this is happening is because of the way the team has binned each feature. A more granular binning process might make the model more accurate.

  • From our observations on the trend of song similarity using jaccacrd index, we can use this technique to find the decade at which the song probably belongs to.

  • Using the entire 1 million song dataset may improve the model by increasing the available song choices.

  • Even if the recommended song genres are not aligned to the chosen songs by the respondents, they still find inherent similarities and on average, still like the recommendations.

References

Million Song Dataset. Retrieved from https://labrosa.ee.columbia.edu/millionsong/

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.

Global Music Report 2018. https://www.ifpi.org/downloads/GMR2018.pdf

The Role of Music in Human Culture. https://thoughteconomics.com/the-role-of-music-in-human-culture/

Acknowledgments

Babiera, Johniel & Nebres, Elisa. Who's Hott? Anatomy of the hottest artists in the music industry.pdf. Asian Institute of Management. 2018. Special Thank to our respondents:
Elisa Nebres
Johniel Babiera
Earl Abraham Aian Rosales
Jude Teves
Josh Hiwatig
AC Arcin
Patricia Manasan
Chichan Soriano
Jon Colipapa
Bingbong Recto
Miguel Valdez